12-16-2020 10:27 PM - edited 07-05-2021 12:55 PM
Hi All,
Over the last couple of days, we have this issue that has come up and causing grief - all APs disassociate and reassociate every minute or so. The WLC is on code 8.5.161.0. The redundancy mode on this is a HA setup, and current status is Active and Standby. Nothing has changed at our end configuration-wise and the latency for WLC from each of these APs, which are spread across multiple sites, is 11 seconds or thereabouts as per WLC AP association latency stats.
The APs themselves are on POE and staying up and the Uptime on WLC validates that. I tried shutting down and reenabling POE on some of these APs, but that has not helped. I have also tried changing the AP retransmit configuration parameters, but it did not make a difference.
A continous ping from local AP switch to the WLC, which is in a Data centre, does not show any latency issues.
AP hardware varies and we have 3602s, 3702s and 3802s mostly.
Error messages on WLC we have been seeing are as below:
Thu Dec 17 17:22:52 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
1 Thu Dec 17 17:22:52 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
2 Thu Dec 17 17:22:52 2020 AP's Interface:0(802.11b) Operation State Up: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
3 Thu Dec 17 17:22:52 2020 AP's Interface:0(802.11b) Operation State Down: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
4 Thu Dec 17 17:22:52 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
5 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:00:a2:ee:xx:Xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
6 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:00:a2:ee:xx:Xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
7 Thu Dec 17 17:22:51 2020 AP's Interface:0(802.11b) Operation State Up: Base Radio MAC:00:a2:ee:xx:Xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
8 Thu Dec 17 17:22:51 2020 AP's Interface:0(802.11b) Operation State Down: Base Radio MAC:00:a2:ee:xx:xx:Xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
9 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:00:a2:ee:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
10 Thu Dec 17 17:22:51 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:00:a2:ee:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
11 Thu Dec 17 17:22:50 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:84:b8:02:xx:xx:xx Cause=RADIO_RC_IDB_RESET: Radio reset due to (10) Radio interface reset Status:NA
12 Thu Dec 17 17:22:50 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:f4:db:e6:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
13 Thu Dec 17 17:22:50 2020 AP's Interface:0(802.11abgn) Operation State Up: Base Radio MAC:f4:db:e6:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
14 Thu Dec 17 17:22:49 2020 AP Disassociated. Base Radio MAC:58:ac:78:xx:xx:xx ApName - XX_50_Lon_L32_AP1
15 Thu Dec 17 17:22:49 2020 AP's Interface:1(802.11a) Operation State Down: Base Radio MAC:58:ac:78:xx:xx:xx Cause=AP_IF_TRAP_ECHO_TIMEOUT: Radio reset due to (121) Heartbeat Timeout Status:NA
16 Thu Dec 17 17:22:49 2020 AP's Interface:0(802.11b) Operation State Down: Base Radio MAC:58:ac:78:xx:xx:xx Cause=AP_IF_TRAP_ECHO_TIMEOUT: Radio reset due to (121) Heartbeat Timeout Status:NA
17 Thu Dec 17 17:22:49 2020 AP's Interface:2(802.11a) Operation State Up: Base Radio MAC:e8:ed:f3:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
18 Thu Dec 17 17:22:49 2020 AP's Interface:1(802.11a) Operation State Up: Base Radio MAC:e8:ed:f3:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
19 Thu Dec 17 17:22:49 2020 AP's Interface:0(802.11b) Operation State Up: Base Radio MAC:e8:ed:f3:xx:xx:xx Cause=RADIO_RC_CODE_UNDEF: Radio reset due to (0) Unknown Reset Status:NA
20 Thu Dec 17 17:22:49 2020 AP 'XX__Geo_345_L16_AP2', MAC: f4:db:e6:xx:Xx:Xx disassociated previously due to Link Failure. Uptime: 182 days, 19 h 00 m 26 s . Reason: Capwap Discovery Request.
21 Thu Dec 17 17:22:48 2020 AP 'XX_50_Lon_L33_AP2', MAC: e8:ed:f3:xx:xx:xx disassociated previously due to Link Failure. Uptime: 60 days, 17 h 56 m 43 s . Reason: Capwap Echo request.
Any idea on what I can do to troubleshoot this next?
12-16-2020 10:51 PM
12-16-2020 10:58 PM - edited 12-16-2020 10:58 PM
@colossus1611 wrote:
The redundancy mode on this is a HA setup
Look at the APs' logs and see if they disassociate due to DTLS.
What happens if you force a fail-over? Does this "stabilize" the issue?
12-17-2020 07:31 AM
if it branch and connect to central WLC through WAN check the WAN connection,
see if there log of WAN failure in same time the AP is disconnect.
12-17-2020 04:44 PM
I have rebooted the primary WLC, and the redundancy did failover to secondary as expected, but this has not fixed the issue unfortunately. Still only seeing limited WAPs connected and seeing disassociations and radio resets.
12-17-2020 05:20 PM
can we see debug
12-17-2020 06:21 PM
12-17-2020 11:09 PM
12-17-2020 11:51 PM
- Make sure the AP's are not subject to the Internal Authorization List feature on the WLC and if it >is used or needed , make sure all mac addresses of the AP's are allowed.
M.
12-18-2020 03:18 AM
12-18-2020 01:51 AM
12-18-2020 03:40 AM
Hi Scott, we did send out a tech to troubleshoot which was basically to do SPAN captures on the Core Nexus switch off which the WLC connects. We basically saw DTLS packets going from Controllers to APs, but no packets back. Packet capture was being done in both directions and we could see ping packets towards WLC but no DTLS packets.
What I meant with the NAT rule was how we manage these devices in the customer environment from our enviornment. Basically all Network devices management IPs have been NATted for us to manage them, but this does not include APs. Hence I cannot connect to APs at all by SSH to do any ebugs.
The change we had over past weekend was the Core Nexus switch code upgrade, however the pre and post configuration comparison shows no difference at all. The WLC is of course directly connected and reachable from Nexus switches, so I can't see what else could be causing any issue from Nexus point of view for APs to disassociate and reassociate, if at all it is a Nexus issue.
I did fail back to primary WLC after testing with a failover to secondary. The primary WLC is active and secondary is standby apart from those 15 minutes when I tested it.
The topology for this network is on a high level as per attached. I have shown one DC site only, however a replica exists at Redandant DC with all same device and Nexus in vPC across both DCs.
12-18-2020 08:19 AM
12-18-2020 10:57 AM
As I see there are two nexus sw are they config with vPC?
do you check the return packet in one nexus check the packet in other nexus,
i think that there is asymmetric connection,
from ap to WLC I see through one nexus
wlc to ap through other nexus
to check that
match the GW in WLC with nexus to asa next hop if they pass the same nexus then we will check other topic.
please can I see the config of both nexus
12-28-2020 04:43 PM
Hi All,
Update to this - all resolved now, and truned out to be pretty silly thing - firewall team had unintentionally blocked port 5647 but left 5646 open, resulting in APs connecting, but not staying connected beyond a minute or so once they cannot establish data connection! Never suspected what I was seeing would have anything to do with firewall, but then reached out to them as a last resort, and voila!
Thank you for all the inputs.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide