cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1141
Views
4
Helpful
20
Replies

ARP not resolved while roaming

O_A_H
Level 1
Level 1

We have Cisco 3500 WLC with 2800 APs (8.10.196.0). APs operate in Local mode, so, SSID traffic is central switched. We have a roaming problem. We used 802.1x-FT. With each roaming event, the client will do ARP to the gateway after it finishes. The problem happens is that most of the times, the first ARP packet is not seen in the AP debugs nor on the AP SPAN packet capture, so, the clients waits 300-500ms and try another ARP which will make it's way to the WLC via CAPWAP but the WLC drops this packet and doesn't forward it to the core (GW), then the client waits again for 300-500ms and send 3rd ARP packet which makes it's way till the core and gets a reply. The client will not pass traffic till the GW ARP is resolved, which accordingly in this situation causes video call freeze for about 0.5 - 1 second (This can be seen from the packet capture on the laptop where the client only receives video traffic and not sending till ARP is resolved). If we assume that the first ARP packet is lost in the air, i find that strange that only first ARP is lost. What could be the issue with the first ARP packet? Would it be a bug (what is the bug ID)? We tested on PSK SSID and the ARP behavior was the same.
To overcome the WLC dropping the second ARP, I have converted APs to flexconnect and made the SSID local switching so that ARP will be sent from client->AP->core (bypass WLC). This enhanced the video call experience but still some freezes happens due to the 1st ARP packet that is being lost. I have tested to add static ARP entry of the GW on the client (to avoid sending GW ARP after each roaming event) but that didn't seem to have an effect and the client still do GW ARP after each roaming event. In a separate floor where we have only 9120 APs (not 2800), the same ARP behavior was seen.

Another side of the story is that, after the client does successful FT roaming, it will send EAPOL START message to start full reauthentication. Then, full reauthentication happens followed by DHCP and ARP (ARP issue is as stated above). From the laptop packet capture, we can confirm that the full reauthentication behaviour doesn't interrupt 2-way video traffic forwarding during that... it's only ARP (that happens in the end) will interrupt till it's resolved. Why this EAPOL START behavior happens? What is noticed that EAPOL START behaviour happens on laptops that use client certificate. But on clients with machine certificate, they don't do EAPOL START not do full reauthentication (Just FT quick roaming). All laptops are managed by Intune.

At this stage, i'm quite stuck tshooting this issue. The next action point probably would be to create open SSID and do OTA with packet capture on laptop and AP port to see what happens with the first ARP packet... but still as mentioned, it doesn't make sense that most of the time it's lost on Air (RF issue) that only impacts the 1st ARP packet and nothing else from the rest of the traffic.

20 Replies 20

Check point 9 dhcp require

MHM

I already did.. that's not the case

There are two dhcp process 

Full dhcp process which exchange 4 dhcp message 

download.jpeg

And other dhcp process in which client send only request to inform dhcp server it still use same IP 

O_A_H_0-1754387657909.png

client do second dhcp process after roaming not full dhcp process 

am I right 

MHM

last update 
1- DHCP I think this clear now 
2- GARP <<- how you detect if WLC not send GARP to core SW or not ?

3- Authentication Key Management.................... FT-802.1x <<- from info you share it clear that client support FT
4- check bug below 

https://quickview.cloudapps.cisco.com/quickview/bug/CSCvn05881

Thanks for sharing your ideas and your collaboration!
2- Where GARP fits in the picture here? I didn't see that in the link you shared.

4- Thanks for sharing the bug. This is not relevant in my case. Let's split the conversation for clarity, as per my tshooting, the roaming issue i have is due to 1st ARP packet from the client is dropped and that causes 500ms delay. This part we decided to stop tshooting as we plan to migrate to 9800 this year as i stated earlier. The 2nd part when we talked about why clients do DHCP each roaming event, and that's a different point, and that's what we were trying to figure out with my conversation with you.

I found interest article but it for wlc 9800  not for AirOS

but I think it apply to any roaming in cisco WLC

""In the Catalyst 9800 Series controllers, a group of access points belonging to a floor, building, or location can be tagged with different policy, site, or RF tags, but can have the same WLAN profile. That is, the same SSID name can be advertised across all of the APs in the controller. A client could roam between two APs in the same controller that are configured with the same WLAN profile but associated with different policies. In such a scenario, the client is forced to go through a full authentication and DHCP process and renew its IP address.""

https://www.cisco.com/c/en/us/products/collateral/wireless/catalyst-9800-series-wireless-controllers/cat9800-ser-primer-enterprise-wlan-guide.html

MHM

Review Cisco Networking for a $25 gift card