ARP not resolved while roaming
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2025 01:35 AM
We have Cisco 3500 WLC with 2800 APs (8.10.196.0). APs operate in Local mode, so, SSID traffic is central switched. We have a roaming problem. We used 802.1x-FT. With each roaming event, the client will do ARP to the gateway after it finishes. The problem happens is that most of the times, the first ARP packet is not seen in the AP debugs nor on the AP SPAN packet capture, so, the clients waits 300-500ms and try another ARP which will make it's way to the WLC via CAPWAP but the WLC drops this packet and doesn't forward it to the core (GW), then the client waits again for 300-500ms and send 3rd ARP packet which makes it's way till the core and gets a reply. The client will not pass traffic till the GW ARP is resolved, which accordingly in this situation causes video call freeze for about 0.5 - 1 second (This can be seen from the packet capture on the laptop where the client only receives video traffic and not sending till ARP is resolved). If we assume that the first ARP packet is lost in the air, i find that strange that only first ARP is lost. What could be the issue with the first ARP packet? Would it be a bug (what is the bug ID)? We tested on PSK SSID and the ARP behavior was the same.
To overcome the WLC dropping the second ARP, I have converted APs to flexconnect and made the SSID local switching so that ARP will be sent from client->AP->core (bypass WLC). This enhanced the video call experience but still some freezes happens due to the 1st ARP packet that is being lost. I have tested to add static ARP entry of the GW on the client (to avoid sending GW ARP after each roaming event) but that didn't seem to have an effect and the client still do GW ARP after each roaming event. In a separate floor where we have only 9120 APs (not 2800), the same ARP behavior was seen.
Another side of the story is that, after the client does successful FT roaming, it will send EAPOL START message to start full reauthentication. Then, full reauthentication happens followed by DHCP and ARP (ARP issue is as stated above). From the laptop packet capture, we can confirm that the full reauthentication behaviour doesn't interrupt 2-way video traffic forwarding during that... it's only ARP (that happens in the end) will interrupt till it's resolved. Why this EAPOL START behavior happens? What is noticed that EAPOL START behaviour happens on laptops that use client certificate. But on clients with machine certificate, they don't do EAPOL START not do full reauthentication (Just FT quick roaming). All laptops are managed by Intune.
At this stage, i'm quite stuck tshooting this issue. The next action point probably would be to create open SSID and do OTA with packet capture on laptop and AP port to see what happens with the first ARP packet... but still as mentioned, it doesn't make sense that most of the time it's lost on Air (RF issue) that only impacts the 1st ARP packet and nothing else from the rest of the traffic.
- Labels:
-
Wireless LAN Controller
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2025 02:42 AM
Does rebooting the APs help in any matter?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2025 02:44 AM
No unfortunately.. Neither rebooting WLC. This was tested while upgrading to 8.10.196.0 (in the process of tshooting) and also APs rebooting when converting from local to flexconnect.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2025 05:28 AM - edited 06-15-2025 05:28 AM
Good luck with that <smile>
There were a lot of similar problems on those APs (and in common with 9120 they have Broadcom chipset).
The problems were mostly fixed in https://bst.cloudapps.cisco.com/bugsearch/bug/CSCwa73245 but it wouldn't surprise me if they missed off corner cases. Also see Leo's list of bugs affecting 2800/3800/4800/1560 APs
As AireOS is end of life now there is zero chance of getting that fixed now. You can look through all those bugs and try some of the workarounds suggested to see if they make any difference. If not then your only option is to try upgrading to 9800 WLC. The AP code is largely similar (but with additional fixes) and the WLC code is mostly new (and yes new bugs too) but at least if you still find the same problem in the latest code then you can at least open a TAC case and hopefully get it fixed.
Also, and this might solve your problem with the 2nd lost ARP on 9800 regardless of other bugs, at least for central switching: ARP proxy:
https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/technical-reference/c9800-best-practices.html#AddressResolutionProtocolARPproxy
Please click Helpful if this post helped you and Select as Solution (drop down menu at top right of this reply) if this answered your query.
------------------------------
TAC recommended codes for AireOS WLC's and TAC recommended codes for 9800 WLC's
Best Practices for AireOS WLC's, Best Practices for 9800 WLC's and Cisco Wireless compatibility matrix
Check your 9800 WLC config with Wireless Config Analyzer using "show tech wireless" output or "config paging disable" then "show run-config" output on AireOS and use Wireless Debug Analyzer to analyze your WLC client debugs
Field Notice: FN63942 APs and WLCs Fail to Create CAPWAP Connections Due to Certificate Expiration
Field Notice: FN72424 Later Versions of WiFi 6 APs Fail to Join WLC - Software Upgrade Required
Field Notice: FN72524 IOS APs stuck in downloading state after 4 Dec 2022 due to Certificate Expired
- Fixed in 8.10.196.0, latest 9800 releases, 8.5.182.12 (8.5.182.13 for 3504) and 8.5.182.109 (IRCM, 8.5.182.111 for 3504)
Field Notice: FN70479 AP Fails to Join or Joins with 1 Radio due to Country Mismatch, RMA needed
How to avoid boot loop due to corrupted image on Wave 2 and Catalyst 11ax Access Points (CSCvx32806)
Field Notice: FN74035 - Wave2 APs DFS May Not Detect Radar After Channel Availability Check Time
Leo's list of bugs affecting 2800/3800/4800/1560 APs
Default AP console baud rate from 17.12.x is 115200 - introduced by CSCwe88390
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-15-2025 06:45 AM
If roaming is good then client no need to ask IP.
I will check my points and update you
MHM
