cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
919
Views
15
Helpful
4
Replies

Fabric APs VXLAN Tunnel MAC Flap/ARP Issue (Bug CSCwb68720)

O_H
Level 1
Level 1

I would like to share my experience with wireless issues of 9120 APs in Fabric mode with 9800 WLC.
We had a bad wifi experience and users disconnects & unable to join, etc.
After a long tshooting, I couldn't figure out something but i noticed that sometimes some APs lose their CAPWAP connection to the WLC for a couple of minutes. I saw this shouldn't cause the issue we had because we had kind of outage for 1 or more hours.... so, APs still can forward data traffic, and control tunnel loss of 2 minutes is not a big deal in this situation for some APs.

I decided to upgrade from 17.3.3 to 17.6.4 (which is the recommended version at this time). After the upgrade, things got better, but still we had issues in one building where clients couldn't join to the wifi at all.

On DNAC, i noticed this MAC flapping logs (too many of them for mutiple hosts, switches & ports):
Layer 2 loop symptoms: MAC_FLAPPING
SW_MATM:MACFLAP_NOTIF
28057: switch1: 032846: Oct 26 14:08:23.717: Host xxxx.xxxx.xxxx in vlan 1058 is flapping between port Ac23 and port Gi2/0/7

O_H_0-1666854444889.png

I checked that and i realized that this port is an AP port, and the AccessTunnel Ac23 belongs to the same AP:

O_H_1-1666854573385.png

O_H_2-1666854665541.png

Then checking the MAC addresses on this port, i found many MACs for clients are coming of the AP port, which i found illogical as they should be coming out of the Ac23 tunnel... and this this makes sense with MAC flapping message.

O_H_3-1666854867133.png

I also noticed that some APs don't build the vxlan tunnel and some lose it and build again.

I did some research about that, and then i found this ugly bug CSCwb68720 with high hit count (Cisco says that it is already resolved in the version that i run now... which is not true... no surprise!)

O_H_4-1666855201579.png


I then downgraded to 17.3.5b (which is also another MD recommended version at this time), and so far i don't see these MAC flapping logs, and i expect that these issues are gone.

If you find this useful, like and rate!

4 Replies 4

marce1000
VIP
VIP

 

 -  -  Review the 9800  WLC  configuration with the CLI command : show  tech   wireless , have the output analyzed by  https://cway.cisco.com/tools/WirelessAnalyzer/  , please note do not use classical show tech-support (short version) , use the command denoted in green for Wireless Analyzer.               Checkout all advisories!

 M.



-- Each morning when I wake up and look into the mirror I always say ' Why am I so brilliant ? '
    When the mirror will then always repond to me with ' The only thing that exceeds your brilliance is your beauty! '

O_H
Level 1
Level 1

Thanks!... indeed that was one of the initail tshooting steps i did before the upgrade... but it didn't highlight major things related to the issue in my case. But it is definitely useful and important

Arshad Safrulla
VIP Alumni
VIP Alumni

How did you conclude that the AP is causing the disconnection? I would say check the AP logs from WLC side. You may run a Radioactive Trace for AP MAC or IP to get more visibility. 

I would suggest that you open 2 cases, one with wireless team for AP's and another with DNAC team for the overlay connectivity. Let TAC collaborate and troubleshoot this issue, 

O_H
Level 1
Level 1

If you read the CSCwb68720 you will know that this is a bug happening from the wireless side (AP/WLC) not from the switch side (DNAC). As i mentioned, upgrading the WLC to 17.3.5b fixed the issue, and i don't see any more MAC flapping logs on the DNAC related to AccessTunnel vs AP Port. So, this is indeed the bug. Issue is resolved, and fingers crossed!

Also offcourse i checked the logs... i didn't want to make the post too long mentioning every single detail... just to be inclusive and informative as possible.

If you find is useful... like and rate

Review Cisco Networking for a $25 gift card