03-12-2018 11:10 AM - edited 03-08-2019 02:13 PM
Hello,
We're running Cisco Nexus9000 C9372PX. System version: 7.0(3)I4(4) with VPC configration. Lately we noticed some data loss/flapping. Especially when you configured a port-channel.
Interface port-channel1522 description description switchport mode trunk switchport trunk native vlan 1380 switchport trunk allowed vlan 1380,1370 no lacp suspend-individual vpc 1522
After sometime traffic gets interuption in older port-channels. I managed to see in show ip route command that only one nexus is having direct route toward that host at the times of loss.
For example:
19:31:05.155 NET03B# show ip route 10.13.22.105 vrf lan 19:31:05.166 IP Route Table for VRF "lan" 19:31:05.167 '*' denotes best ucast next-hop 19:31:05.167 '**' denotes best mcast next-hop 19:31:05.167 '[x/y]' denotes [preference/metric] 19:31:05.167 '%<string>' in via output denotes VRF <string> 19:31:05.167 19:31:05.167 10.13.22.105/32, ubest/mbest: 1/0, attached 19:31:05.168 *via 10.13.22.105, Vlan1322, [250/0], 00:26:59, am
and other nexus shows this:
18:56:14.030 NET03A# sh ip route 10.13.22.105 vrf lan 18:56:14.048 IP Route Table for VRF "lan" 18:56:14.048 '*' denotes best ucast next-hop 18:56:14.048 '**' denotes best mcast next-hop 18:56:14.048 '[x/y]' denotes [preference/metric] 18:56:14.048 '%<string>' in via output denotes VRF <string> 18:56:14.048 18:56:14.048 10.13.22.0/23, ubest/mbest: 1/0, attached 18:56:14.049 *via 10.13.22.2, Vlan1322, [0/0], 00:05:58, direct
The only thing helps - reboot one or both boxes. After that both have routes to /32 AM.
Anyone experenced simmilar issue ? Is this is a bug or something else ? Thanks for any input.
03-12-2018 01:40 PM
Since this is a host route, our specific route will be triggered when we have a complete ARP entry for the host. Were you able to verify if the second N9K had complete ARP during these issues?
Configuring a port-channel will result in a TCN which will result in the N9K flushing the MAC table. Normally this should recover itself and if traffic hits the N9K without the specific ARP entry, we should glean punt the traffic and resolve ARP that way.
Can you verify stability of VLAN1322 with 'show spanning tree vlan 1322 detail'?
03-12-2018 02:45 PM
thank you,
here what show stp shows:
VLAN1322 is executing the rstp compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 1322, address 0062.ec1a.1905
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 4 last change occurred 4:36:56 ago
from port-channel151
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
the strange thing is that in port-channel creation i create and allow only two vlans. And the problem starts to appear on other ones. Do TCN works per port not per vlan ? Still trying to understand if it's related to TCN and STP. And yes other N9K has complete ARP entry.
We're thinking to do and upgrade, try again to create bunch of port-channels and see if it's helping.
Any other ideas ? thanks
03-12-2018 02:51 PM
The TCN is pointing to Po151 as the cause of this TCN but this may not be accurate if you have reloaded the devices.
Number of topology changes 4 last change occurred 4:36:56 ago from port-channel151
Depending on how you deploy the port-channel, you can create the TCN across all VLANS, i.e. you configure it as a trunk prior to pruning to specific required VLANs.
We can figure out what may be leading to this issue but we would need to have specific logs from few processes to properly figure this out. This would require us to take a look in the broken state most likely.
03-12-2018 02:59 PM
thanks.
po151 ir peerlink
interface port-channel151
description VPC Peer-Link
switchport mode trunk
switchport trunk allowed vlan 1,1301-1399,1801-1899
spanning-tree port type network
vpc peer-link
and that timer then the devices was reloaded.
Which command will help to troubleshoot more ?
03-13-2018 12:51 AM - edited 03-13-2018 01:28 AM
Also, seems the reason could be this bug:
As i starting to remind. This behaviour we're experiencing gets only when configuring bunch of a port-channels at the same time e. g. 8 port-channels. When im doing this in manual way there'are no disruptions.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide