cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
856
Views
0
Helpful
1
Replies

Nexus 9500 Cores and Nexus 9300 TOR & AGG - Random traffic drop

mohdtaamneh
Level 1
Level 1

Hello Expert,

I have new implementation where I am installing Nexus 9504  as Core layer for Data Center, Nexus 93128TX as TOR for Servers and Nexus 9396PX  as Aggregation for other devices other than servers (like routers, edge/floor switches , etc). All of them are running 7.0(3)I1(3). The setup is working in a standalone NX-OS mode (not ACI).

The topology as shown in diagram 1.

  •  There are more devices connecting to Agg A and B (more edge/ floor switches, second WAN router).
  •   Leaf switches (TOR’s and Agg’s) are L2 devices with management IP only.
  •   All connection between the Spines and Leafs are L2 trunk connections. Also, the vPC link between the two cores.
  •   SVI’s are configured on cores switches.
  •   Both corers are running HSRP.  
  •   There is EIGRP running between the nexus core switches and the WAN routers. The neighborship is built using isolated vlan  (vlan  251) which not allowed internally.

In random occasions, the connectivity between Core switches and other devices in the network (i.e.  AGG switches) goes down. Ping the core from one of the directly connected Agg switches NOT work (ping directly connected SVIs physical and virtual IP).  Also, EIGRP goes down and become no EIGRP adjacency between the core SW’s and the WAN router.  In other word, connectivity inside the LAN (servers and internal users from different vlan’s ) and outside to branches via the WAN router, connectivity to both goes down.

This outage lasts for 30-40 minutes, then all of a sudden, the problem goes away and the setup forwards traffic fine, then after few hours or minutes the same problem repeats.

In order to isolate the issue, we tried to simplify the setup by SHUT core B so each Leaf become connected to core A. The setup become as shown in diagram 2:

  •  HSRP not in role since only one core is working.
  •  No vPC. 
  •  As mentioned before, there are more devices connecting to Agg A and B (more floor switches, second WAN router).

And also I have the same problem with this setup (diagram 2).

I found the following bug but hitting the N7K :

CSCus91417: Nexus 7000 fails to trasmitt RSTP BPDUs consistently

Symptom:
- STP Instability in layer 2 domain with N7K (Rapid PVST).
- This issue is seen in large scale setups with approximately 6000 RSTP sessions running. The issue is independent of the number of VDCs.

Conditions:
Packet drops were seen in the inband soft VOQ Transmit side on the SUP. This issue is seen on N7000 SUP2 supervisors. Packet drops were NOT seen on N7000 SUP2E and N7700 SUP2 Supervisors.

Workaround:
If customer is running RSTP at the scale level they can see this issue. Peer-switch has been shown to be an effective workaround in the field generally. However, if customer has a lot of orphan ports, single connected, peer-switch does not help in this situation. Additionally, as scale increases toward the RSTP limit, peer-switch becomes slightly less effective since we are dropping BDPUs from both sides more often.

Further Problem Description:
Inband packet drops on the transmit side of the Supervisor were seen in large Scale setups. The packet drops were noticed on "show hardware internal sup-fc0 voq-stats".

Could the above bug causing the outage? Any one aware if its hitting the N9000? Or could be another reason like a mis-configuration? 

Any help/feedback is highly appreciated.

Best Regards,

Mohammad Taamneh

 

1 Reply 1

Rajeshkumar Gatti
Cisco Employee
Cisco Employee

You are not hitting CSCus91417. This bug is very specific to Nexus 7k.

Your symptom indicates a possibility of a control plane issue where CPU is probably inundated with some traffic that is causing other protocols to get victimized. Check CoPP (Show policy-map interface control-place) on the 9K and see if you notice anything wierd or drops. This may have to be monitored at the time of the event.

Also see if you can SPAN the CPU of the 9k as well. This may give clues on CPU bound traffic.

-Raj

Review Cisco Networking for a $25 gift card