For those of us that are familiar with the process by which an AP finds its controller, we know that there is L2 broadcast, Option 43, DNS resolution, and shared neighbor information via OTP, as well as the final option to statically assign a controller IP via the 'lwapp ap controller ip address x.x.x.x'. If you watch the process via 'debug lwapp client event' process on an AP, you will see that each IP address is categorized as to how it was learned using a number (0-4). Here's my question: Are these numbers used in a priority order when an AP attempts to join a controller? I had a 1252 on a 2106 running 22.214.171.124 and no domain (the AP got its controller IP via 'option 43 ascii x.x.x.x' from a DHCP scope on a 2960 switch). Then I moved it to my lab setting, where it's a 4402-25 running 126.96.36.199 and a domain. I expected the new resolution of CISCO-LWAPP-CONTROLLER to be successful and have it join my controller. However, all I saw was the stored entries in NVRAM from the previous controller to which it was joined. I had a couple of options to force it to join my lab controller, and I chose Option 43. That seemed to work and the AP happily downgraded its code. Any thoughts/comments? I'm just surprised that the new DNS resolution (which did work b/c the debug showed 'translating [OK]') didn't allow the AP to join my lab controller.
Here is my experience and understanding of the very-odd LWAPP/CAPWAP discovery process:
Presume you have the following:
1. 21xx WLC for Lab and/or priming purposes running new or old firmware;
2. A number of 41xx or WiSM all throughout the network (Production)
You follow the documentations and prime the AP's in your lab. The AP's naturally join the first WLC in a small subnet/network, which is the 21xx. Of course, it will upgrade/downgrade the AP's IOS blah, blah, blah.
We all know how the AP's discover and join the WLC. Does anyone know how the AP's remember the WLC details? According to Cisco documentation: "Once joined, the AP will have one or more controller IP Addresses stored LOCALLY."
This means that once the AP is pulled out of the lab network and thrown into the production network, it will look for the 21xx details. Unfortunately, it is a hit-and-miss for the AP to say something like "I can't find the 21xx anywhere, I'll join the first WLC I can find."
Thus, the command of 'lwapp ap controller ip address x.x.x.x' comes in.
I've seen this scenario happen several times. I've done DHCP Option 43 first and use the "lwapp ap controller ip address x.x.x.x" to resort most of the problem Option 43 can't fix.
In my humble opinion ...
One more thing ... If you are using CLI on the LWAP and you are getting the error message of "Error!! Command is disabled", this means that you are on the wrong IOS. Commands such as the "clear lwapp" can only be invoked if the RCV IOS is being used.
FWIW - I was able to get rid of the Error! Command is disabled after I changed the username/pwd on the AP from it's factory default. I had to reset one AP today, and of course, it joined the wrong controller. So my way around this was to delete the wrong controller DNS entry, reset the AP to factory default, and then it jumped onto the right controller, then I re-add the wrong controller DNS entry. Cisco TAC isn't giving me much for info on how to solve this besides "you can return it and we'll send you another one". Pretty frustrating. Almost seems like you have to rely on the command you and others posted ('lwapp ap controller ip address x.x.x.x' ).
Just wanted to post a follow-up to this. After working with TAC, we found a solution to my problem (AP's joining the wrong controller). In order to be sure we had the AP join the right controller, we created Access Lists in the GUI (Security, AP Policies). This worked great. Each controller only accepts certain AP's. Thanks to all for the help on this.
Ouch, that sounds like a quite uggly work-around. May work great if you have some controlelrs and some APs, but what if you have 20 controllers and 300 APs and still growng rapidly?
That would be a pain to manage, I think....
Anyway, that said, I am facing the exact same problem with access-points sometimes not joining the controller you would expect.
My setup is, DHCP option 43 provides the local controller address, DNS resolves to one controller address in a datacnetr which is also the guest anchor controller, then OTAP is disabled, and controllers are always in the server vlans (whilst the server vlans have no helper adresses or something, so the L3 broadcat is blocked and does not leave the vlan of the controller). APs are in different vlan.
The problem I am facing is similar, and worse. Sometimes when I have an AP that looses connection to it's controller, the AP homes to another controller. In most cases, that controller is in another country/location, and may have other country codes enabled, and may not even support the regulatory domain of the AP in question.
Then, when the original controller comes back online, the AP does not go back to it's own controlelr (since it is not in the same mobility group), and resetting the AP brings it back to the original controller, but then the AP looses it's country settings, as when the AP joins a controller with the same regulatory domain as the AP itself, it seems to select the first enabled country code available in the domain.
In some case, this results in radios staying down, and have to manualyy re-eable the radios and set the correct country code from WCS.
Working with TAC, and even engineers that have been on-site, and reading to documents it seems the whol discovery and selection process has changed over the years with newer version, and seems that this has not been changed after 4.2.
However, one engineer stated it has changed in 5.2 (I am running the elatest 5.2 version) but no docs to support it.
Another engineer informed me that in the LWAPP discovery response from a controller, the controller sends all IPs of controllers in its mobility group that it knows of. If so, that would require mobility groups to be different for each location. Other engineers state this is not required.
A lot of discrepancies out there, IMHO.
Wish there would be one guru that could prefectly outline the whole discovery and selection proces, and provide supporting documents that show the explanation goes for 5.2
Just venting a bit ;-)
Call or email me (check my profile) - I can talk more with you about this stuff. I have some documentation about how the process works in pre 5.0 code, but I suspect the difference the engineers are talking about is the migration from LWAPP to CAPWAP (Control And Provisioning of Wireless Access Points), which is open standard.
Scott, the format of the option 43 command in IOS for the AP you are using is incorrect. It should be option 43 hex F1aabbccddee, where aa= hex value of the number of bytes to follow (04 for one IP address, 08 for two), where bbccddee are the hex values of each octet of the IP address of the Mang. Interface of the controller
Thanks for the reply. I'm aware of the HEX configuration for Option 43. I just did a quick re-read of the documentation for that and didn't realize that the option for ascii only applies to 1000 series APs!! Thanks!!