This was very helpful. However, it did not actually solve a major problem for me when doing WLC code upgrades. A description (with minor steps omitted) of which is as follows:
1) Download new code to the controller
2) Predownload the code to the APs
3) Do an image swap on the APs so the next time they boot they will come up with the new image
4) Reboot the WLC on the new code
Now the problem come in to play. The APs boot considerably faster than my 8510/8540 HA controller. In the several minutes before their controller is ready for APs to join it, they have moved to another controller with the old code level. The APs get the code level form that controller and then join it. Once up, the APs figure out that they need to move back to their original controller. Therefore, they start the process to do this, which now involves downloading the new code level from their original controller. This code had already been downloaded, but was moved aside when the AP joined the down level controller. Finally they get the code (again) for their intended controller and join it. All is good again.
EXCEPT: There are several problems with this, as follows:
Rather than being a quick outage for the users (we typically have 100,000+ client devices connected to our network), the process can take over an hour. Our controllers typically have around 3,000 APs each, and it take a good while to download code to that many devices. Having to do the AP code download three times (initially to get the proper code, a second time so they can join the wrong controller, and a third time to get back to normal) takes way too long. In addition, allowing users on the APs then taking it away from them again is not nice.
I found a way around this recently on a test controller that only had one building (with about 125 APs) on it. I turned off power to the switch ports (on six different 3750/3850 switch stacks) while leaving power on to the other ports so that our Cisco VoIP phones would still work. This took a little while to build the needed CLI script (actually 6 since I had to do this on 6 different switches) starting with the output from a show cdp neighbor | i AIR- on each switch stack. I generated a script that issued a power inline never on each port and ran it just before I rebooted the controller. While the controller was rebooting I changed the script to turn power back on, but did not run it until the HA pair was running on the new code (slightly over 3 minutes). Then, when the APs came up with the new code level they could find their controller, they were running the matching code, and wall was good. Total network downtime less than 5 minutes. Quite acceptable.
However, this process would be excessively cumbersome to run on 12 HA controllers running over 8,000 APs sin 200+ buildings, many with multiple switch stacks.
What I need is a way to keep the APs from moving to another controller while they wait for the controller they should be on to come up. A great feature for the development folks to implement would be a way tell the APs to not attempt to find another controller for xx minutes. The value of xxx could be set by the controller just prior to a reboot. Or, the value could be supplied by DHCP option set since the APs will all be getting an IP address when they reboot (this might not be great since some folks might not use DHCP to handle IP address for their APs).
In the meantime, I cannot sit around hoping for a Cisco-supplied feature to fix this timing problem.
I have considered ACLs on the switches that passes traffic from the APs to the controllers. This would work, but would be excessively cumbersome to maintain for us. We run MPLS on our campus with three (soon to be four) MPLS areas. Each area has several WLCs with certain buildings assigned to a particular WLC. We are also constantly upgrading controllers, both hardware and software. It is mandatory that we have good control over what APs go to what controller. We also have to consider code levels required by APs since we have some very new ones as well as some that are quite old (5+ years on a few).
I have thought about using DHCP Option 43. That would be fine if it was strictly followed by the APs. But, it is not. It might work if I had a way to prevent the AP from trying other controllers (e.g., those learned from WLC advertisements). However, I do not know how to do this. Could this be done with a switch ACL?
Can I tinker with the fairy lengthy algorithm used by an AP to find a controller? I really just need the APs to join a particular controller (which can, and does, change all along), even if it has to wait a while for it to become available.
Does anyone have any recommendations of things that have worked for them or processes that they have not used but that might work for us? I am willing to try just about anything that is easy to maintain even if it takes a good while to get set up.
What info did I leave out of this question that I should have included?
THANKS for all suggestions.
Network Engineer, Office of Information Technology
The University of Alabama
... View more