06-12-2025 01:35 AM
We have Cisco 3500 WLC with 2800 APs (8.10.196.0). APs operate in Local mode, so, SSID traffic is central switched. We have a roaming problem. We used 802.1x-FT. With each roaming event, the client will do ARP to the gateway after it finishes. The problem happens is that most of the times, the first ARP packet is not seen in the AP debugs nor on the AP SPAN packet capture, so, the clients waits 300-500ms and try another ARP which will make it's way to the WLC via CAPWAP but the WLC drops this packet and doesn't forward it to the core (GW), then the client waits again for 300-500ms and send 3rd ARP packet which makes it's way till the core and gets a reply. The client will not pass traffic till the GW ARP is resolved, which accordingly in this situation causes video call freeze for about 0.5 - 1 second (This can be seen from the packet capture on the laptop where the client only receives video traffic and not sending till ARP is resolved). If we assume that the first ARP packet is lost in the air, i find that strange that only first ARP is lost. What could be the issue with the first ARP packet? Would it be a bug (what is the bug ID)? We tested on PSK SSID and the ARP behavior was the same.
To overcome the WLC dropping the second ARP, I have converted APs to flexconnect and made the SSID local switching so that ARP will be sent from client->AP->core (bypass WLC). This enhanced the video call experience but still some freezes happens due to the 1st ARP packet that is being lost. I have tested to add static ARP entry of the GW on the client (to avoid sending GW ARP after each roaming event) but that didn't seem to have an effect and the client still do GW ARP after each roaming event. In a separate floor where we have only 9120 APs (not 2800), the same ARP behavior was seen.
Another side of the story is that, after the client does successful FT roaming, it will send EAPOL START message to start full reauthentication. Then, full reauthentication happens followed by DHCP and ARP (ARP issue is as stated above). From the laptop packet capture, we can confirm that the full reauthentication behaviour doesn't interrupt 2-way video traffic forwarding during that... it's only ARP (that happens in the end) will interrupt till it's resolved. Why this EAPOL START behavior happens? What is noticed that EAPOL START behaviour happens on laptops that use client certificate. But on clients with machine certificate, they don't do EAPOL START not do full reauthentication (Just FT quick roaming). All laptops are managed by Intune.
At this stage, i'm quite stuck tshooting this issue. The next action point probably would be to create open SSID and do OTA with packet capture on laptop and AP port to see what happens with the first ARP packet... but still as mentioned, it doesn't make sense that most of the time it's lost on Air (RF issue) that only impacts the 1st ARP packet and nothing else from the rest of the traffic.
06-12-2025 02:42 AM
Does rebooting the APs help in any matter?
06-12-2025 02:44 AM
No unfortunately.. Neither rebooting WLC. This was tested while upgrading to 8.10.196.0 (in the process of tshooting) and also APs rebooting when converting from local to flexconnect.
06-15-2025 05:28 AM - edited 06-15-2025 05:28 AM
Good luck with that <smile>
There were a lot of similar problems on those APs (and in common with 9120 they have Broadcom chipset).
The problems were mostly fixed in https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwa73245 but it wouldn't surprise me if they missed off corner cases. Also see Leo's list of bugs affecting 2800/3800/4800/1560 APs
As AireOS is end of life now there is zero chance of getting that fixed now. You can look through all those bugs and try some of the workarounds suggested to see if they make any difference. If not then your only option is to try upgrading to 9800 WLC. The AP code is largely similar (but with additional fixes) and the WLC code is mostly new (and yes new bugs too) but at least if you still find the same problem in the latest code then you can at least open a TAC case and hopefully get it fixed.
Also, and this might solve your problem with the 2nd lost ARP on 9800 regardless of other bugs, at least for central switching: ARP proxy:
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/technical-reference/c9800-best-practices.html#AddressResolutionProtocolARPproxy
06-15-2025 06:45 AM
If roaming is good then client no need to ask IP.
I will check my points and update you
MHM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide