cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2204
Views
7
Helpful
31
Replies

ARP not resolved while roaming

O_A_H
Level 1
Level 1

We have Cisco 3500 WLC with 2800 APs (8.10.196.0). APs operate in Local mode, so, SSID traffic is central switched. We have a roaming problem. We used 802.1x-FT. With each roaming event, the client will do ARP to the gateway after it finishes. The problem happens is that most of the times, the first ARP packet is not seen in the AP debugs nor on the AP SPAN packet capture, so, the clients waits 300-500ms and try another ARP which will make it's way to the WLC via CAPWAP but the WLC drops this packet and doesn't forward it to the core (GW), then the client waits again for 300-500ms and send 3rd ARP packet which makes it's way till the core and gets a reply. The client will not pass traffic till the GW ARP is resolved, which accordingly in this situation causes video call freeze for about 0.5 - 1 second (This can be seen from the packet capture on the laptop where the client only receives video traffic and not sending till ARP is resolved). If we assume that the first ARP packet is lost in the air, i find that strange that only first ARP is lost. What could be the issue with the first ARP packet? Would it be a bug (what is the bug ID)? We tested on PSK SSID and the ARP behavior was the same.
To overcome the WLC dropping the second ARP, I have converted APs to flexconnect and made the SSID local switching so that ARP will be sent from client->AP->core (bypass WLC). This enhanced the video call experience but still some freezes happens due to the 1st ARP packet that is being lost. I have tested to add static ARP entry of the GW on the client (to avoid sending GW ARP after each roaming event) but that didn't seem to have an effect and the client still do GW ARP after each roaming event. In a separate floor where we have only 9120 APs (not 2800), the same ARP behavior was seen.

Another side of the story is that, after the client does successful FT roaming, it will send EAPOL START message to start full reauthentication. Then, full reauthentication happens followed by DHCP and ARP (ARP issue is as stated above). From the laptop packet capture, we can confirm that the full reauthentication behaviour doesn't interrupt 2-way video traffic forwarding during that... it's only ARP (that happens in the end) will interrupt till it's resolved. Why this EAPOL START behavior happens? What is noticed that EAPOL START behaviour happens on laptops that use client certificate. But on clients with machine certificate, they don't do EAPOL START not do full reauthentication (Just FT quick roaming). All laptops are managed by Intune.

At this stage, i'm quite stuck tshooting this issue. The next action point probably would be to create open SSID and do OTA with packet capture on laptop and AP port to see what happens with the first ARP packet... but still as mentioned, it doesn't make sense that most of the time it's lost on Air (RF issue) that only impacts the 1st ARP packet and nothing else from the rest of the traffic.

15-09-2025: Update

We have migrated the APs to 9800-CL and also in Flexconnect local switching. And the issue persists with 1st ARP being lost.

I could see the same behaviour with each roaming event.

1- After roaming, the client will do ARP to the GW, and at this stage, it keeps receiving traffic from the sender, but sending nothing.

2- The 1st ARP packet most of the times doesn't get a response (90%). Then the client waits about 500ms before sending another ARP to the GW. Then the GW responses to the 2nd ARP, and then the client starts sending traffic again.

31 Replies 31

JPavonM
VIP
VIP

@O_A_H  did you perform the disable icmp redirects? Is it the issue fixed?

We are seeing lot of temporarily disconnections on clients due to no APR responses from the default gateways (c9300 with SVIs), and some disconnection on APs for the same reason (as reported by the AP itself; non-Cisco AP).

No because as per my last tshooting & packet capture, the 1st ARP packet that is lost is not seen on the AP switchport. My OTA was disturbed so i couldn't verify on the air. But it is always the 1st ARP packet get lost (doesn't even reach the switch). I will have to plan another tshooting session with proper OTA to find out where this packet is lost. (But this was before i move the setup to flexconnect as a workaround to compensate for the 2nd lost ARP packet by the WLC).

Are you talking about wired clients as well or only wireless? And what was reported on the AP, can you give more details?

Review Cisco Networking for a $25 gift card