Catalyst 4506-E SUP 8E - Arp issues in ANY Vlan

ROBERT THOMPSON · ‎04-02-2014

Hi All,

We have an issue which is related to bug CSCuj73571 .

IP traffic in all vlans works fine. As soon as there is ARP traffic, the switch will stop processing. We also noted 100% utilisation on core CPU 0 during this issue. The client is using SUP8E's on 4506-E's with cat4500es8-universal.SPA.03.03.00.XO.151-1.XO.bin

You will experience this at different time intervals to us, depending on how much arp traffic is on your VLAN/VLAN's - It could fail in seconds or in our clients case, 45 minutes, which we could put a clock to.

This is exactly what we see (as per bug CSCu73751)

#show platform cpu packet driver

Forerunner Packet Engine 0.28 (0)

Receive Queues: received packets summary

Qu Capac Guara CurPo Unpro Accum Kept BperP Packets

2 2512 112 2303 0 3 2511 64 339959 <--- Kept stays at 2511, Packets does not increment

8 1008 512 67 0 3 3 64 67

9 2512 304 96 0 0 0 433 96

Receive Queues: dropped packets summary

Qu Total Packets Drop No Cell Drop Overrun Drop Underrun

2 339959 100390067 0 0 <--- Drop No Cell increments

This issue is 100% reproducable with a traffic generator, in any vlan, generating random ARP traffic at any speed or flow rate. The workaround listed in this bug does not work as we could duplicate it in any vlan, not just vlan 1.

We found that generating this command from conf t:

ip arp inspection vlan 1-4094

no ip arp inspection vlan 1-4094

(note - we just did it on all vlans as we wanted to test what we had in the lab with all the clients vlans)

This then solves the issue and reduces CPU load to normal conditions on core 0. Also kept buffers now operate correctly and increment/decrement as designed. However, at reboot, you will be back to the issue (or power loss). Using Event Manager to write an applet to run this at startup is not a good workaround either.

The issue is only around arp traffic on your network. You can rate limit all you want or remove rate limiting on arp packet inspecting but unless you toggle inspection on then off, it will continue to fail. Also, noted during debugging this issue in the lab, if you toggle arp inspection back on after disabling it, you will eventually have a network failure scenario on ARP packets. IE - ARP will not work on your network.

There is no fix on this for SUP8E's and it only appears to be in 3.6.E which is not acceptable. Our client wants to return the SUP8's and put SUP7's in which has a fix. This is also not good, so we are looking at Cisco to solve this issue ASAP.

Anyone from Cisco is welcome to contact me on this subject.

Robert Thompson, CCIE #10302

Richard Primm · ‎04-02-2014

Hello Robert,

Please open a Tac case on this if possible. Once open, feel free to message me the SR number and I can assist.

-Luke (lprimm)

ROBERT THOMPSON · ‎04-03-2014

Hi Luke

SR 629803585

Best regards

Rob

Richard Primm · ‎04-03-2014

thanks, got it

ROBERT THOMPSON · ‎04-03-2014

Hi Luke

If you need to contact me, please message me and I will send you email and contact details here in the UK

Best regards

Rob

ROBERT THOMPSON · ‎04-06-2014

Hi Luke,

What are your thoughts on this?

Best regards,

Rob

javen.wang · ‎05-22-2014

Hey Rob,

We recently met the same issue, and solved by the following EEM script:

event manager applet toggleIpv6Snooping authorization bypass

event syslog occurs 1 pattern "Terminal state reached"

action 100 cli command "enable"

action 200 cli command "configure terminal"

action 300 cli command "vlan configuration 1"

action 400 cli command "ipv6 snooping"

action 500 cli command "no ipv6 snooping"

action 600 cli command "end"

action 700 syslog msg "******** TAC_EEM Complete: Vlan1 Workaround Applied ********"

Javen

ROBERT THOMPSON · ‎05-27-2014

Hi Javen,

Although this is a good workaround, you don't really want this in a production network and that is why we pushed Cisco for the answer.

The problem is actually to do with how Cisco are actually monitoring your arp traffic and dhcp traffic and a bunch of other data. Its an undocumented feature, but the fix is to actually turn off the macros that are turned on by default. It was introduced in IOS XE 3.3.0 (15.1), which is the version we were running.

no macro auto monitor

That command sorts the problem out by not directing packets to the cpu queue for processing (inspection) and subsequent queues filling up.

Hope people find that information useful. The original bug will be updated today with reasons and the workaround.

Best Regards,

Rob

Robert Thompson, CCIE #10302

Chris Knipe · ‎10-14-2014

Definitely test this on a single site before deploying to your enterprise. We are running into these same ARP issues on our sup8e which are installed in our 4500 chasis.

After entering this command I noticed lots of network links going down. As you can imagine, it was quite a fire drill to hurry to bring things back as quick as possible.

One thing I noticed is that and trunk link connected by a single link seemed to have went down after this command.

However our trunks that were connected via a Port Channel stayed up and online.

These devices were still seen as a cdp neighbor to the 4500 with IP information available via cdp neighbor detail though they were not reachable until I entered the command MACRO AUTO MONITOR back into the switch and then cleared the arp.

mamaral · ‎11-18-2015

Hi Robert,

I have a client that has had some problems with is 4500X with an ios that has this same bug, but in is case he did not have any device with ipv6 configured. Are you using IPv6?

Tkx,

Miguel

ROBERT THOMPSON · ‎11-18-2015

Hi,

No we were not running IPv6.

Can you check that all your traffic or at least traffic with issues is not in VLAN 1? If it is, move the traffic to a new VLAN.

Best Regards

Rob