Nexus 7k 5.x: Network slow down since install a couple of weeks ago

alphaomegait · ‎03-17-2011

Since install we have been having slowdowns in the network. We moved the SVI's back to our 6500's and had some improvement. After looking at the traffic for a couple of weeks, we see that we have packets showing up out of sequence.

My 1^st thought was EIGRP load balancing’s default behavior, like we had per packet load balancing but turns out that is not on.

As of today we feel like we have bumped up against this bug in the code and will have to upgrade the code:

Unicast Packets not Forwarded via Peer-Link with Peer-Gateway Enabled

Symptom:
Packet received on an external vPC link or L3 port, destined to peer SVI
mac-address may be dropped on Local Nexus switch. The packet makes it to the
local supervisor module as expected but never forwarded over the peer-link to
the peer switch.

Conditions:
Nexus 7000 in VPC with Peer-Gateway configured running NX-OS 5.1(2). Issue
exists with VPC and VPC+.

Workaround(s):
Disable Peer-Gateway Feature under the VPC domain.

Anyone else seen anything like this or have advance about where to look next?

Zizhen Gao · ‎03-20-2011

5.1.2 is subject to "CSCtl85080 Netstack(vPC): MBUF failure due non standared size unicast ARP requests", and it appears that you have most of the conditions met to trigger this bug as well:

o vPC domain running 5.1.2
o peer-gateway enabled
o 3rd device connected via 1 leg vPC to SW01
o Unicast ARP requests (re-arp) are targeting SW02. Since its 1 leg vPC, the packets has to hit SW01, gets punted to Netstack, and finally tunneled to SW02.
o if the ARP packet size is not standard (not 64 bytes), SW01 will experience MBUF failures and will drop the ARP replies causing SW02 to drop all traffic targeting the related IP addresses.

The bug is fixed in 5.1.3, and the workaround is to disable the Peer-gateway feature, if possible. You can verify if you are hitting this bug by issuing "show ip arp internal buffers":

ARP Packet MBUF status:

       Packet mbuf statistics:

mbufs obtained from page pool         30   --> should be around ~20 normally
m_bytes                             8144   --> should be around 4000 normally

thanks

zz