cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5301
Views
45
Helpful
14
Replies

All APs disassciate every minute or so and reassociate to WLC 5508

colossus1611
Level 1
Level 1

Hi All,

 

Over the last couple of days, we have this issue that has come up and causing grief - all APs disassociate and reassociate every minute or so. The WLC is on code 8.5.161.0. The redundancy mode on this is a HA setup, and current status is Active and Standby. Nothing has changed at our end configuration-wise and the latency for WLC from each of these APs, which are spread across multiple sites, is 11 seconds or thereabouts as per WLC AP association latency stats.

 

The APs themselves are on POE and staying up and the Uptime on WLC validates that. I tried shutting down and reenabling POE on some of these APs, but that has not helped. I have also tried changing the AP retransmit configuration parameters, but it did not make a difference. 

 

A continous ping from local AP switch to the WLC, which is in a Data centre, does not show any latency issues.

 

AP hardware varies and we have 3602s, 3702s and 3802s mostly.

 

Error messages on WLC we have been seeing are as below:

 

Thu Dec 17 17:22:52 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
1 Thu Dec 17 17:22:52 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
2 Thu Dec 17 17:22:52 2020 AP's Interface:0(802.11b) Operation State Up: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
3 Thu Dec 17 17:22:52 2020 AP's Interface:0(802.11b) Operation State Down: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
4 Thu Dec 17 17:22:52 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
5 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:00:a2:ee:xx:Xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
6 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:00:a2:ee:xx:Xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
7 Thu Dec 17 17:22:51 2020 AP's Interface:0(802.11b) Operation State Up: Base Radio MAC:00:a2:ee:xx:Xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
8 Thu Dec 17 17:22:51 2020 AP's Interface:0(802.11b) Operation State Down: Base Radio MAC:00:a2:ee:xx:xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
9 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:00:a2:ee:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
10 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:00:a2:ee:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
11 Thu Dec 17 17:22:50 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
12 Thu Dec 17 17:22:50 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:f4:db:e6:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
13 Thu Dec 17 17:22:50 2020 AP's Interface:0(802.11abgn) Operation State Up: Base Radio MAC:f4:db:e6:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
14 Thu Dec 17 17:22:49 2020 AP Disassociated. Base Radio MAC:58:ac:78:xx:xx:xx ApName - XX_50_Lon_L32_AP1
15 Thu Dec 17 17:22:49 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:58:ac:78:xx:xx:xx Cause=AP_IF_TRAP_ECHO_TIMEOUT: Radio reset due to (121) Heartbeat Timeout Status:NA
16 Thu Dec 17 17:22:49 2020 AP's Interface:0(802.11b) Operation State Down: Base Radio MAC:58:ac:78:xx:xx:xx Cause=AP_IF_TRAP_ECHO_TIMEOUT: Radio reset due to (121) Heartbeat Timeout Status:NA
17 Thu Dec 17 17:22:49 2020 AP's Interface:2(802.11a) Operation State Up: Base Radio MAC:e8:ed:f3:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
18 Thu Dec 17 17:22:49 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:e8:ed:f3:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
19 Thu Dec 17 17:22:49 2020 AP's Interface:0(802.11b) Operation State Up: Base Radio MAC:e8:ed:f3:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
20 Thu Dec 17 17:22:49 2020 AP 'XX__Geo_345_L16_AP2', MAC: f4:db:e6:xx:Xx:Xx disassociated previously due to Link Failure. Uptime: 182 days, 19 h 00 m 26 s . Reason: Capwap Discovery Request.
21 Thu Dec 17 17:22:48 2020 AP 'XX_50_Lon_L33_AP2', MAC: e8:ed:f3:xx:xx:xx disassociated previously due to Link Failure. Uptime: 60 days, 17 h 56 m 43 s . Reason: Capwap Echo request.

 

 

Any idea on what I can do to troubleshoot this next? 

14 Replies 14

Scott Fella
Hall of Fame
Hall of Fame
Has anything changed anywhere on the network? Im assuming this has been working for a while with no issues? What I would do is failover to the other controller and see if that helps. I have ran into issue where the primary has issues and the fix was to reboot it. When you issue a force failover, that will reboot the primary. Again, from my experience and it seems like your environment is down anyways if aps keep disappearing every minute or so, bounce it.
-Scott
*** Please rate helpful posts ***

Leo Laohoo
Hall of Fame
Hall of Fame

@colossus1611 wrote:

The redundancy mode on this is a HA setup


Look at the APs' logs and see if they disassociate due to DTLS.  
What happens if you force a fail-over?  Does this "stabilize" the issue?

if it branch and connect to central WLC through WAN check the WAN connection,

see if there log of WAN failure in same time the AP is disconnect.

colossus1611
Level 1
Level 1

I have rebooted the primary WLC, and the redundancy did failover to secondary as expected, but this has not fixed the issue unfortunately. Still only seeing limited WAPs connected and seeing disassociations and radio resets.

can we see debug 

Hope you have a TAC case. Seems that if nothing changes on the controller, something changed elsewhere.
-Scott
*** Please rate helpful posts ***

colossus1611
Level 1
Level 1

Have a TAC case but not having much luck. TAC requesting debugs off APs which we don't have direct CLI access to due to silly NAT rules. Here is some debug output from WLC attached.

 

 

 

 - Make sure the AP's are not subject to the Internal Authorization List  feature on the WLC and if it >is used or needed , make sure all mac addresses of the AP's are allowed.

 M.



-- ' 'Good body every evening' ' this sentence was once spotted on a logo at the entrance of a Weight Watchers Club !

Hi Marce,

 

No Internal Authorization List configured on WLC, so not an issue.

 

 

You can’t send a tech out onsite to troubleshoot? Also what do you mean silly Nat rules prevent you from doing that? If all aps are behind a nat, then you need to also make sure that udp ports 5246/5247 is allowed and can hit the controller. What you probably need to do is draw out an example of how your network is layer out to give others an idea. I still don’t see how all of a sudden your aps that have been working well started to all fail with no change in the network.
Also, when you failed over, you didn’t fail it back to the primary? You kept it up on the secondary?
-Scott
*** Please rate helpful posts ***

Hi Scott, we did send out a tech to troubleshoot which was basically to do SPAN captures on the Core Nexus switch off which the WLC connects. We basically saw DTLS packets going from Controllers to APs, but no packets back. Packet capture was being done in both directions and we could see ping packets towards WLC but no DTLS packets.

 

What I meant with the NAT rule was how we manage these devices in the customer environment from our enviornment. Basically all Network devices management IPs have been NATted for us to manage them, but this does not include APs. Hence I cannot connect to APs at all by SSH to do any ebugs.

 

The change we had over past weekend was the Core Nexus switch code upgrade, however the pre and post configuration comparison shows no difference at all. The WLC is of course directly connected and reachable from Nexus switches, so I can't see what else could be causing any issue from Nexus point of view for APs to disassociate and reassociate, if at all it is a Nexus issue.

 

I did fail back to primary WLC after testing with a failover to secondary. The primary WLC is active and secondary is standby apart from those 15 minutes when I tested it.

 

The topology for this network is on a high level as per attached. I have shown one DC site only, however a replica exists at Redandant DC with all same device and Nexus in vPC across both DCs.

 

 

 

 

Well... you had a change and that change broke the wireless. That is how I see it. If everything was working prior to the core upgrade and then after the upgrade the wireless started to no work.... then you need to investigate the core. Maybe it’s time to revert back to the original code or look at upgrading. Did you open a tac case with the Nexus team and let them know that it broke after the upgrade? Just because the config is identical doesn’t mean you are not hitting a bug.
-Scott
*** Please rate helpful posts ***

As I see there are two nexus sw are they config with vPC?

do you check the return packet in one nexus check the packet in other nexus,

i think that there is asymmetric connection,

from ap to WLC I see through one nexus

wlc to ap through other nexus 

to check that 

match the GW in WLC with nexus to asa next hop if they pass the same nexus then we will check other topic.

please can I see the config of both nexus 

colossus1611
Level 1
Level 1

Hi All,

 

Update to this - all resolved now, and truned out to be pretty silly thing - firewall team had unintentionally blocked port 5647 but left 5646 open, resulting in APs connecting, but not staying connected beyond a minute or so once they cannot establish data connection! Never suspected what I was seeing would have anything to do with firewall, but then reached out to them as a last resort, and voila!

 

Thank you for all the inputs.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: