10-12-2017 12:18 PM - edited 03-10-2019 01:13 PM
We're having some kind of problem with ARP on Nexus 3548. Here is the topology:
The Checkpoint is connected via LACP to the two Nexus switches. When the checkpoint FW tries to ARP request the MAC on Nexus 1 (VLAN interface 99), the following happens:
1. the ARP request is sent out from the checkpoint to Nexus 2, and always on that interface, I'm guessing the load balancing hash on the FW decides on that interface.
2. Nexus 2 recieves the ARP req on Po3, tagged with VLAN 99. It forwards it via Po1 to Nexus 1 (according to debug)
3. Nexus 1 gets the ARP req on Po1, and answers it out on Po3, directly to the FW.
4. FW accepts the reply and the ARP resolution is complete.
As long as the above happens all works fine. BUT, occantionally, without any obvious explanation and with completely inconsistent timing, this happens:
1. the ARP request is sent out from the checkpoint to Nexus 2, as usual
2. Nexus 2 recieves the ARP req on Po3, tagged with VLAN 99. But it does not forward it! (according to debug ip arp packet)
3. Nexus 1 never gets the ARP req (according to debug), so it never responds. The FW does not get the MAC address for Nexus 1 and all traffic to the switch itself does not work.
Here's what I have checked:
1. There is no obvious timing in the failure. It can work fine for hours, it can be down for hours, it can be down a few seconds, it can be faulty 10 times within an hour...
2. This only happens to switch 1, never to switch 2, but that could be explained by ARPs always going to switch 2 first.
3. Traffic THOUGH the switch seems unaffected
4. Switch 2 can always reach and ARP req switch 1. So can other units on the same VLAN. The difference is that the FW is connected though vpc
5. vpc consistency parameters are all OK, there no change when the error occurs
6. NOTHING is logged when the error occurs. Logging i local, level is warning
7. No spanning tree issues as far as I can see, and no changes when the error occurs
Software is 6.0(2)A8
I cant find a bug report that matches my situation.
Any ideas?
03-02-2018 03:01 AM
We've had similar issues with our nexus 3548 <<6(2)A8>>. There's a known nexus bug which keeps the CPU usage running sky high.
Try *show processes cpu history* you'll be able to observe the device's CPU usage...due to the software bug my nexus is regularly above 70% on 'CPU% per hour (last 72 hours)'. The only way to solve this issue was by rebooting the nexus (well in my case).
03-02-2018 08:03 AM
It was a bug. TAC confirmed it. The bug is in 6.0(2)A8(3), and was resolved when we upgraded to 6.0.2(A8)4a.
Cisco have updated a bugID on this (earlier I think it said it only affected HSRP):
https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvc55268/?reffering_site=dumpcr
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide