05-08-2024 06:39 AM - edited 05-08-2024 07:22 AM
Hi, we have had an issue with a rollout of new AP's on site. The AP model is 9120 and the WLC is 9800 17.9.5, Switch infrastructure is 9300L on 17.09.04a
We had rolled out 109 AP's across our site with no issues however the last 3 we have activated had issues with connections to our corporate SSID where the client was not being issued a DHCP address. Debug logs showed the client DHCP request timing out. We looked at all aspects of the set up - configuration of the switch interface and AP are all templated through DNA centre and there was no apparent interference or noise disrupting the connection. Clients connecting to our Guest SSIDs were fine.
We were about to log a call with Cisco so the logs could be checked but we first moved the APs onto a different switch. Same interface configuration but the APs started working correctly and clients were getting IP addresses. The switch they were moved from had 6 other APs connected to them that were all fully functional.
There are no logs on the Switch or in ISE to indicate a connectivity issue. PoE on the original switch was also well within limits.
Has anyone else seen this issue before or have any ideas on why moving the APs to a different switch would resolve the issue ?
Thanks.
05-08-2024 07:18 AM
Are the corporate and guest networks centrally switched (tunneled through WLC) or locally switched (Flex configuration terminating traffic at the AP switch port)?
Is the good switch the same model/code/configuration as the bad switch? Please share the AP port configuration from each.
05-08-2024 08:39 AM
>... showed the client DHCP request timing out.
- Are the client DHCP request(s) arriving at the DHCP server (check logs) ?
M.
05-08-2024 04:00 PM
Sounds like upstream vlans not trunked if its flexconnect.
05-09-2024 09:03 AM
Agreed with Haydn - if you're doing flexconnect local switching then most likely thing affecting a specific switch is going to be vlan configuration on the switch:
1. is the vlan configured (either statically or dynamically) on the switch?
2. is the vlan allowed on the trunk to the upstream switch(es)?
There are also some issues with stale CAPWAP connections on the WLC load balancer affecting APs which can be resolved by power cycling the AP. So moving the APs might simply have resolved the issue by resetting the APs. We have seen a similar issue (for centrally switched WLANs) with DHCP getting dropped between AP and WLC (apparently by the WLC). Resetting the AP will normally cure it when it establishes a new CAPWAP connection on a new port. We don't seem to have seen it since upgrading to 17.9.4 APSP6.
05-22-2024 07:41 AM - edited 05-22-2024 07:42 AM
Thanks for the replies to my issue, your input is really appreciated. Just to confirm, the AP's were not working when connected to a switch which already had 6 AP's already connected and working correctly. Interface configuration was exactly the same as was the model of the AP. All AP's were connecting back to the same backend WLC.
The issue started with one AP which had been installed along with the original batch and we saw problems when testing. We then added 2 additional AP's to provide coverage to some meeting rooms so that we could troubleshoot the AP. However these 2 APs exhibited the same issue. APs were power cycled multiple times and eventually we tried them on a different switch which resolved the issue.
I haven't yet tried to replicate the issue with a test AP to see if I can create the connectivity problem again on the same switch.
05-24-2024 02:54 AM - edited 05-24-2024 02:57 AM
If you can reproduce the issue again with a test AP then you'll need to troubleshoot empirically with packet captures at different points - on the AP, on the switch port and on the WLC to isolate where the problem is.
Of course it's also possible that you have faulty switch ports or port ASIC on the switch.
We've also previously encountered https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt00292 which causes random packet flows to be dropped from some ports (and you can see 100 customer cases attached to this bug at the moment). Apparently https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvx31753 (which is hidden) was supposed to introduce an automatic correction mechanism (otherwise the switch must be manually reloaded after detecting the problem). TAC told us that was supposed to go into 17.7.1 at the time but as we can't see the bug details hard to tell if it did.
So if you see the problem, then reload the switch and the problem goes away then you've probably hit CSCvt00292. You should be able to see the ASIC drops before the reload (as per the bug notes) if that is the case.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide