Re: Clients not connecting due to DHCP timeout

NISNetworks · ‎05-08-2024

Hi, we have had an issue with a rollout of new AP's on site. The AP model is 9120 and the WLC is 9800 17.9.5, Switch infrastructure is 9300L on 17.09.04a

We had rolled out 109 AP's across our site with no issues however the last 3 we have activated had issues with connections to our corporate SSID where the client was not being issued a DHCP address. Debug logs showed the client DHCP request timing out. We looked at all aspects of the set up - configuration of the switch interface and AP are all templated through DNA centre and there was no apparent interference or noise disrupting the connection. Clients connecting to our Guest SSIDs were fine.

We were about to log a call with Cisco so the logs could be checked but we first moved the APs onto a different switch. Same interface configuration but the APs started working correctly and clients were getting IP addresses. The switch they were moved from had 6 other APs connected to them that were all fully functional.

There are no logs on the Switch or in ISE to indicate a connectivity issue. PoE on the original switch was also well within limits.

Has anyone else seen this issue before or have any ideas on why moving the APs to a different switch would resolve the issue ?

Thanks.

eglinsky2012 · ‎05-08-2024

Are the corporate and guest networks centrally switched (tunneled through WLC) or locally switched (Flex configuration terminating traffic at the AP switch port)?

Is the good switch the same model/code/configuration as the bad switch? Please share the AP port configuration from each.

marce1000 · ‎05-08-2024

>... showed the client DHCP request timing out.
- Are the client DHCP request(s) arriving at the DHCP server (check logs) ?

M.

-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

Haydn Andrews · ‎05-08-2024

Sounds like upstream vlans not trunked if its flexconnect.

*****Help out other by using the rating system and marking answered questions as "Answered"*****
*** Please rate helpful posts ***

Rich R · ‎05-09-2024

Agreed with Haydn - if you're doing flexconnect local switching then most likely thing affecting a specific switch is going to be vlan configuration on the switch:
1. is the vlan configured (either statically or dynamically) on the switch?
2. is the vlan allowed on the trunk to the upstream switch(es)?

There are also some issues with stale CAPWAP connections on the WLC load balancer affecting APs which can be resolved by power cycling the AP. So moving the APs might simply have resolved the issue by resetting the APs. We have seen a similar issue (for centrally switched WLANs) with DHCP getting dropped between AP and WLC (apparently by the WLC). Resetting the AP will normally cure it when it establishes a new CAPWAP connection on a new port. We don't seem to have seen it since upgrading to 17.9.4 APSP6.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs

NISNetworks · ‎05-22-2024

Thanks for the replies to my issue, your input is really appreciated. Just to confirm, the AP's were not working when connected to a switch which already had 6 AP's already connected and working correctly. Interface configuration was exactly the same as was the model of the AP. All AP's were connecting back to the same backend WLC.

The issue started with one AP which had been installed along with the original batch and we saw problems when testing. We then added 2 additional AP's to provide coverage to some meeting rooms so that we could troubleshoot the AP. However these 2 APs exhibited the same issue. APs were power cycled multiple times and eventually we tried them on a different switch which resolved the issue.

I haven't yet tried to replicate the issue with a test AP to see if I can create the connectivity problem again on the same switch.

Rich R · ‎05-24-2024

If you can reproduce the issue again with a test AP then you'll need to troubleshoot empirically with packet captures at different points - on the AP, on the switch port and on the WLC to isolate where the problem is.
Of course it's also possible that you have faulty switch ports or port ASIC on the switch.
We've also previously encountered https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvt00292 which causes random packet flows to be dropped from some ports (and you can see 100 customer cases attached to this bug at the moment). Apparently https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvx31753 (which is hidden) was supposed to introduce an automatic correction mechanism (otherwise the switch must be manually reloaded after detecting the problem). TAC told us that was supposed to go into 17.7.1 at the time but as we can't see the bug details hard to tell if it did.

So if you see the problem, then reload the switch and the problem goes away then you've probably hit CSCvt00292. You should be able to see the ASIC drops before the reload (as per the bug notes) if that is the case.

------------------------------
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs