cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
461
Views
0
Helpful
1
Replies

Nexus 7018 Sup1 + M2 24 Port 10G + Fabric 2 Cards - ARP Issue

mlouis
Level 1
Level 1

 

I have been working on an issue for a couple weeks between a pair of Nexus 7018 that are having intermittent reachability issues to a 5K/2K/UCS pod downstream of a couple M2 series cards. This is the setup

 

2 x Nexus 7018

Dual Sup1

NXOS 6.2(2)

1 x M1-32 Port non-XL (serves peer link between 7k) per chassis

1 x M2-24 Port Card per chassis

Nexus 5Ks are single homed - 5K1 has a single port-channel to N7K1 and 5K2 has a single port channel to N7K2 (no VPC setup between 5K and 7K)

 

So here is the issue. The M2 cards on both chassis face the UCS/5K/2K POD and services. VLAN A is configured on both Nexus 7K and is running HSRP between the 7K pair.

In working with TAC we found that ARP requests are leaving VLAN A down into the POD environment, the servers are replying to the ARP request with a unicast ARP request with a DMAC to the primary VLAN A HSRP peer on 7K1. However, the supervisor never sees the ARP reply and the line card proceeds to flood the ARP reply to all active ports on the VLAN including the peer link where the packets then egress out the M1 card on N7K1 and ingress on a M1 card on 7K2, where they then get sent to the active supervisor and the supervisor processes the packet and ARP tables on both switches are updated and pings work.

 

If you clear the ARP on the active N7K1 box (STP is blocking the N7K2 to POD environment BTW due to single non-VPC port channels on the M2 card) it will continue to rely on the flooding mechanism via the m1 card peer link to the second N7K2 to provide ARP resolution. It's resulting it lots of intermittent connectivity, ARP timeouts, etc.

 

We have moved the connections from the M2 cards to the M1 cards and the issue does NOT follow. If the ARP response ingresses on the M1 port it gets punted to the supervisor - however if it ingresses on the M2 card it never makes it. The hardware on the linecard says it's sending it to the supervisor but we never see a response, we just see the L3 engine rewriting the DMAC to all 0000.0000.0000 (unknown) and flooding it out the peer link to the other M1 card looking for a response.

 

Any one ever seen anything like this? we even tried a different set of m2 cards and the problem followed to the new cards. we flipped active/standby supervisors (sup1) but not change in behavior.

1 Reply 1

Peter Koltl
Level 7
Level 7

I would rate this question at 5 stars. (-: