cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
4014
Views
10
Helpful
6
Replies

Cisco Wireless - Troubleshooting ARP issues

jh5280
Level 1
Level 1

Hi All,

 

We recently had some TAC assistance with some wireless issues and lots of clients.  Apparently we have numerous clients that are essentially creating ARP floods, TAC was able to show us some of the MAC addresses that were causing the issues, but we really need a way to not only find all of them, but figure out how to really troubleshoot this.  Our desktop and end user teams are very different groups and we need to provide them detailed lists of these devices to try and figure out what's going on.

 

Essentially, we have some clients that are apparently ARP'ing continuously, and as stated TAC provided us with some but we need to understand how that's found.  We are leveraging 9800 WLC's with numerous AP's including older ones and a lot of newer 9K series AP's.  On some Cisco products, there are commands like 'show arp traffic', but that's not available from what we can see.  DNAC also doesn't appear to show us anything useful, nor do radioactive traces.  Any help on tracking and documenting devices (likely in the hundreds, on dozens of AP's) that are exhibiting this behavior would be great!

 

thanks in advance.

6 Replies 6

Hi

 What I see here is the same history as always:  Network guy trying to find some other´s team problems. Why networking have always to prove that the problem is not in the network?

 That being said, I´d like to add that the Access Point or WLC or any other device dont have an easy way to identify this anomalies because they are working hard doing their job which is routing and switch packet back and forth on the network.

 If a device send a packet (malformed arp)  in which the destination can´t be found, the switch will do its job and try to find it at any cost.

 For this situation, unfortunatelly, you need to start searcing on link like this one. Old but probably still happening in newer versions.

https://nblug.org/pipermail/discuss/2008-January/000345.html 

 

Lastly, try to protect yourself using  DAI (Dynamic Arp Inspection).

https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst9300/software/release/16-6/configuration_guide/sec/b_166_sec_9300_cg/configuring_dynamic_arp_inspection.html#concept_21AE207C674C4B5992199CDCC96F202F 

 

Rich R
VIP
VIP

As to your original question - how to identify those clients: you best option is a packet capture.  Where you do that depends on where your traffic is switched.  If it's locally switched then on the VLAN on the switch locally.  It it's centrally switched then on the central switch VLAN or on the 9800 itself.  IOS-XE makes packet captures really easy - "mon cap ?"

yeah we were hoping that given the controller based architecture, maybe there was a way to actually do two things.  Capture a list of all of the clients exhibiting this behavior, and then eventually start to hone in on individual clients and do things like packet captures.

 

The thing is, we really had no idea this was happening until we tried to implement a new version of code, where by Cisco introduced (of course without really making it apparent and by default enabling this "feature") a new feature that actually blacklists clients that exhibit behavior like this.  trouble is, without making it obvious this was the case, we thought the upgrade had another bug and it actually took TAC a long time to figure out why these clients were blacklisted.  So theoretically, the controllers are already able to see this behavior, whcy can't we initially alert to this behavior, report on it, then give us the option to turn it on.  Sad part is, since every one thought it was a bug, we had to back out the upgrade.  But again, if the controllers are able to "classify" the bad behavior, then there should be a way to see the collective group of offenders ahead of time.  One would think.

 

That leads to the next portion of this, we want to be able to build the list first, work with the end user teams to inspect those systems and find out why they are doing this.  Some could be Macs, some could be Windows based, many are likely even consumer devices like TV's etc.  We'll have to deal with them all differently but we at least need to know who they are before we just boot them off the network.

 

And to Cisco, turning on a feature like this by default is kind of a rookie move, it's great to add new capabilities, but you should at least have them in a non-invasive mode to start, with the admin being able to go in and enable the blacklisting when they are comfortable.

Arshad Safrulla
VIP Alumni
VIP Alumni

What is the impact of this ARP flood in your environment, it causing slowness in wireless? If yes did you consider enabling ARP proxy in WLC?

Also keep in mind default behavior of all 9800 WLC's is to convert broadcast to unicast when it comes to ARP messages. 

What default behaviour is blocking excessive ARP broadcasts? Is client exlcusion? Are you seeing clients are blacklisted due to IP reuse or IP theft?

We technically haven't seen any direct issues with ARP flooding, we now know it's there and we realize behind the scenes it is likely causing issues with CPU spikes, wireless client reliability, etc.

 

Apparently, there is a "new feature" in the version of code we were upgrading too that specifically looks for wireless clients that are essentially committing bad ARP behavior and by default, the WLC blacklists the client.  This feature didn't exist in 17.3.3 which is what we are upgrading from.  As far is the IP reuse/theft, we're not seeing this from these clients.  They just appear to behaving poorly as it relates to ARP.

 

Again, really, we understand how to troubleshoot the individual clients once we find out who they are, but so far we've found no way of creating the list of naughty clients.  We're really not looking to upgrade back to 17.3.5a+ without having fixed this.  there are some new commands to change when/how the rate limiting is done for ARP flooding but that's not the point, we'd like find the clients and fix them ahead of time.  Is there truly no detailed logging we can do on the WLC's/AP's to find this behavior?  Again, it appears that several Cisco devices of the ARP traffic commands.

 

hope that makes sense.

Rich R
VIP
VIP

1. So you encountered this problem with 17.3.5a? (unusual but not unheard of for Cisco to introduce a new feature in a long lived maintenance release.  More likely that it was implemented as a fix for another problem.)

So ... golden rule: always read the release notes carefully before testing and then deploying a new release.  In this case it seems Cisco did provide warning:

https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-3/release-notes/rn-17-3-9800.html#Cisco_Concept.dita_0caa7f59-6283-4cb9-90c6-5530f8350e00

Behavior Change

  • From Cisco IOS XE Amsterdam 17.3.5a onwards, rate limiting is performed for ARP packets for each client to prevent a denial-of-service attack. If a client sends an ARP storm, then the client is excluded. To configure rate limiting, use the ip arp-limit rate command at the policy profile level.

I also highlighted the change in a previous post: https://community.cisco.com/t5/wireless/wlan-clients-can-t-communicate-since-update-from-17-3-3-to-17-3/td-p/4590656

The command is documented https://www.cisco.com/c/en/us/td/docs/wireless/controller/9800/17-5/cmd-ref/b_wl_17_5_cr/configuration-commands-g-to-z.html#wp3016606396 but I don't see any show commands for it specifically.

 

Obviously when it activates you can use "show wireless exclusionlist" but not sure there's any way to see who would trigger it without doing that. I still say a pcap filtered on ARP requests would be a pretty effective method of identifying the culprits.

Review Cisco Networking for a $25 gift card