06-12-2025 01:35 AM
We have Cisco 3500 WLC with 2800 APs (8.10.196.0). APs operate in Local mode, so, SSID traffic is central switched. We have a roaming problem. We used 802.1x-FT. With each roaming event, the client will do ARP to the gateway after it finishes. The problem happens is that most of the times, the first ARP packet is not seen in the AP debugs nor on the AP SPAN packet capture, so, the clients waits 300-500ms and try another ARP which will make it's way to the WLC via CAPWAP but the WLC drops this packet and doesn't forward it to the core (GW), then the client waits again for 300-500ms and send 3rd ARP packet which makes it's way till the core and gets a reply. The client will not pass traffic till the GW ARP is resolved, which accordingly in this situation causes video call freeze for about 0.5 - 1 second (This can be seen from the packet capture on the laptop where the client only receives video traffic and not sending till ARP is resolved). If we assume that the first ARP packet is lost in the air, i find that strange that only first ARP is lost. What could be the issue with the first ARP packet? Would it be a bug (what is the bug ID)? We tested on PSK SSID and the ARP behavior was the same.
To overcome the WLC dropping the second ARP, I have converted APs to flexconnect and made the SSID local switching so that ARP will be sent from client->AP->core (bypass WLC). This enhanced the video call experience but still some freezes happens due to the 1st ARP packet that is being lost. I have tested to add static ARP entry of the GW on the client (to avoid sending GW ARP after each roaming event) but that didn't seem to have an effect and the client still do GW ARP after each roaming event. In a separate floor where we have only 9120 APs (not 2800), the same ARP behavior was seen.
Another side of the story is that, after the client does successful FT roaming, it will send EAPOL START message to start full reauthentication. Then, full reauthentication happens followed by DHCP and ARP (ARP issue is as stated above). From the laptop packet capture, we can confirm that the full reauthentication behaviour doesn't interrupt 2-way video traffic forwarding during that... it's only ARP (that happens in the end) will interrupt till it's resolved. Why this EAPOL START behavior happens? What is noticed that EAPOL START behaviour happens on laptops that use client certificate. But on clients with machine certificate, they don't do EAPOL START not do full reauthentication (Just FT quick roaming). All laptops are managed by Intune.
At this stage, i'm quite stuck tshooting this issue. The next action point probably would be to create open SSID and do OTA with packet capture on laptop and AP port to see what happens with the first ARP packet... but still as mentioned, it doesn't make sense that most of the time it's lost on Air (RF issue) that only impacts the 1st ARP packet and nothing else from the rest of the traffic.
08-05-2025 05:26 AM
Check point 9 dhcp require
MHM
08-05-2025 05:28 AM
I already did.. that's not the case
08-05-2025 05:37 AM
There are two dhcp process
Full dhcp process which exchange 4 dhcp message
And other dhcp process in which client send only request to inform dhcp server it still use same IP
client do second dhcp process after roaming not full dhcp process
am I right
MHM
08-06-2025 09:22 AM - edited 08-06-2025 09:26 AM
last update
1- DHCP I think this clear now
2- GARP <<- how you detect if WLC not send GARP to core SW or not ?
3- Authentication Key Management.................... FT-802.1x <<- from info you share it clear that client support FT
4- check bug below
https://quickview.cloudapps.cisco.com/quickview/bug/CSCvn05881
08-07-2025 12:14 AM - edited 08-07-2025 12:15 AM
Thanks for sharing your ideas and your collaboration!
2- Where GARP fits in the picture here? I didn't see that in the link you shared.
4- Thanks for sharing the bug. This is not relevant in my case. Let's split the conversation for clarity, as per my tshooting, the roaming issue i have is due to 1st ARP packet from the client is dropped and that causes 500ms delay. This part we decided to stop tshooting as we plan to migrate to 9800 this year as i stated earlier. The 2nd part when we talked about why clients do DHCP each roaming event, and that's a different point, and that's what we were trying to figure out with my conversation with you.
08-07-2025 07:41 AM - edited 08-07-2025 07:41 AM
I found interest article but it for wlc 9800 not for AirOS
but I think it apply to any roaming in cisco WLC
""In the Catalyst 9800 Series controllers, a group of access points belonging to a floor, building, or location can be tagged with different policy, site, or RF tags, but can have the same WLAN profile. That is, the same SSID name can be advertised across all of the APs in the controller. A client could roam between two APs in the same controller that are configured with the same WLAN profile but associated with different policies. In such a scenario, the client is forced to go through a full authentication and DHCP process and renew its IP address.""
MHM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide