cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4364
Views
5
Helpful
29
Replies

Some APs dropping from new 9800

jasonmeyer
Level 1
Level 1

I have migrated to a new 9800-40 from an 8510. Recently, at approximately noon every day some APs drop from the 9800.  I did use an AP template to move the access points to the new controller, but I left the old controller in as a secondary controller. I have the old controller off so the APs don't drop for long. I have a case open with TAC but they haven't been the most responsive. I have include a few log files they had requested. AP models are 3702i and 3802i.

 

One odd thing I have noticed in the AP config, the ones that drop have a very long Controller Association Latency time, usually 4-5 minutes long. Not sure exactly what that means.

29 Replies 29

Okay... just making sure that the ap's are not trying to join another controller.

-Scott
*** Please rate helpful posts ***

Did you remove the native vlan configuration under the WLC uplinks? You also need to remove any native VLAN config from switch side port configuration as well.

 

Also I am curious to know why /16 subnet being used in the WMI interface? If all the AP's are in one site, you have to remember that if there are more than 100 AP's it is recommended that you have a dedicated AP Management VLAN, so WMI interface is isolated. If AP's are in remote site, AP's can register to WLC only if the AP's are speaking to SVI VLAN2. So make sure that you advertise option43 correctly.

No I haven't. Why is that bad if the device is reachable that way?

 

Well when you say site, do you mean a site tag or the physical sites the APs are at. I don't have a AP Management vlan, never have. 

JPavonM
VIP
VIP

@jasonmeyer just in case, if AP mode is Flexconnect then do you know that is best practice not to use site tags with more than 100 APs?

https://www.cisco.com/c/en/us/products/collateral/wireless/catalyst-9800-series-wireless-controllers/guide-c07-743627.html#FlexConnectsitetag

 

APs are in local mode.

So your AP's are in local mode, so are your AP's in the same physical location as the controller?  Did you isolate if the issue is with specific access points or every single AP?  Have you tried to place access points on the same subnet as your controller management, if your AP's are in the same location?  There is no firewall between the AP's and the controller?

I want to make sure that you don't have any discovery methods that can make the AP search for an existing AP on the network.  This can make it very difficult to troubleshoot if the AP has the information of an existing controller and decides to move.  Even if the AP tries to move and fails back to the 9800, you will see the ap disassociate.  Removing any entries for HA on the AP, discovery methods on DHCP, DNS of upd forwarding will help.  Mobility Groups will also share that information to access points, so if you have created a mobility peering between your 9800 and any AireOS controller or another 9800.

-Scott
*** Please rate helpful posts ***

Well now that I have narrowed the site tags down, its only happening to a subset of APs in one site tag, which are at one building. The APs are in a different subnet from the controller. I have removed all HA info already and the older controller has been offline for months(actually its already been recycled). As for discovery methods, I have never really used them across different subnets. I "prime" the APs by getting them associated with the controller in my office which as a port in the same subnet as the controller. After placing the AP in a location to give it tags in the 9800 and renaming the AP with is physical location info, I hand it over to a co-worker who physically installs the AP where its need in our different building.


Now I recall this is a school.  Why are you not using FlexConnect?  Can this also be an issue with congestion over the WAN?  Local mode will send all traffic back tot he controller over your WAN.  Even if you wanted traffic to come back to the site the controller(s) are located, having AP's in FlexConnect mode would be better.  Now the max is 100 AP's per FlexConnect Group, but you can have multiple FlexConnect Groups if you want.  This might keep your AP's stable, but really up to you.  If you have more than 100 access points in a site, you can divide the site where there might not be any roaming.  Like separate the main building for any outside buildings, maybe gym's, etc. Do keep in mind that roaming is supported within a FlexConnect Group, but roaming to a different FlexConnect Group will cause a re-auth.  

-Scott
*** Please rate helpful posts ***

Yes, this is for a school district. I guess I really never thought of using FlexConnect. Not sure why a consultant I used didn't recommend it when we migrated from our old controller to the 9800 either. Probably due to our WAN being gigE and we never have had any bandwidth issues. I only have one building with more than 100 APs, well maybe 2, and that's our high school. Would be tough to have the re-auth happen as students and staff move about the building.



One more thing, its very inconsistent. I have been running show wireless stats ap history a few times a day, then searching by date to find APs that disjoin. So far today has only been 1. Yesterday was 50 or so. And its also not the same APs.


Well that can also be something in the WAN.  With FlexConnect, your AP's will stay online and is not sensitive to any congestion or disruption on the WAN.  You need to look at your overall traffic flow and also look at how the wired infrastructure is in each school.  The reason I say this, is because all wired traffic hits the switch and if any traffic isn't in the local site, the traffic will egress that site router to its destination.  This is the same concept with FlexConnect.  Traffic would be placed on the local switch and the infrastructure will handle the routing.  Of course, the WLAN would have to be defined for local switching in order to drop traffic locally.  As an example, maybe the building where the controller is located is the egress point for internet.  The guest SSID can still be defined as local, so all guest traffic would come back to the controller.  

Now back to FlexConnect and having more than 100 AP's.  This is something you would need to look at closely.  There are some things that you don't really care about with roaming.  Maybe from the basement to the 1st floor or any buildings/structures outside of the main building.  This is how many have worked around the 100 AP max limitations.  They might break up floors... floors 1-5, 6-10 in a high-rise building, etc.  

So think about the traffic flow, then think about where roaming is not possible or where its okay to re-auth.  Re-auth is not bad for ope or PSK networks.  with 802.1x, does it matter when folks are walking with their laptops in their bad or the lid shut?  Anyway's that would be another project, its not bad to get done, but there are changes that have to be made to migrate from local to FlexConnect.  Its more work on the backend and transparent to the users when done right.

-Scott
*** Please rate helpful posts ***

Thanks. Sounds like FlexConnect may be an idea to consider, but carefully.



Jason


Its a better design especially if traffic stay's local to the site.

-Scott
*** Please rate helpful posts ***

It would for the high school. The other buildings probably not so much. Most of our internal applications and hosted on servers here at the high school. Also our connection to the internet is here at the high school as well.


Well the other sites can be FlexConnect and the HS can stay local.  That would be better.

-Scott
*** Please rate helpful posts ***

Ok. Gonna give that some thought.


Review Cisco Networking for a $25 gift card