Re: Cisco Nexus9000 Adjacency manager bug

from88 · ‎03-12-2018

Hello,

We're running Cisco Nexus9000 C9372PX. System version: 7.0(3)I4(4) with VPC configration. Lately we noticed some data loss/flapping. Especially when you configured a port-channel.

Interface port-channel1522
description description
switchport mode trunk
switchport trunk native vlan 1380
switchport trunk allowed vlan 1380,1370
no lacp suspend-individual
vpc 1522

After sometime traffic gets interuption in older port-channels. I managed to see in show ip route command that only one nexus is having direct route toward that host at the times of loss.

For example:

19:31:05.155 NET03B# show ip route 10.13.22.105 vrf lan
19:31:05.166 IP Route Table for VRF "lan"
19:31:05.167 '*' denotes best ucast next-hop
19:31:05.167 '**' denotes best mcast next-hop
19:31:05.167 '[x/y]' denotes [preference/metric]
19:31:05.167 '%<string>' in via output denotes VRF <string>
19:31:05.167 
19:31:05.167 10.13.22.105/32, ubest/mbest: 1/0, attached
19:31:05.168 *via 10.13.22.105, Vlan1322, [250/0], 00:26:59, am

and other nexus shows this:

18:56:14.030 NET03A# sh ip route 10.13.22.105  vrf lan
18:56:14.048 IP Route Table for VRF "lan"
18:56:14.048 '*' denotes best ucast next-hop
18:56:14.048 '**' denotes best mcast next-hop
18:56:14.048 '[x/y]' denotes [preference/metric]
18:56:14.048 '%<string>' in via output denotes VRF <string>
18:56:14.048 
18:56:14.048 10.13.22.0/23, ubest/mbest: 1/0, attached
18:56:14.049     *via 10.13.22.2, Vlan1322, [0/0], 00:05:58, direct

The only thing helps - reboot one or both boxes. After that both have routes to /32 AM.

Anyone experenced simmilar issue ? Is this is a bug or something else ? Thanks for any input.

clsulliv · ‎03-12-2018

Since this is a host route, our specific route will be triggered when we have a complete ARP entry for the host. Were you able to verify if the second N9K had complete ARP during these issues?

Configuring a port-channel will result in a TCN which will result in the N9K flushing the MAC table. Normally this should recover itself and if traffic hits the N9K without the specific ARP entry, we should glean punt the traffic and resolve ARP that way.

Can you verify stability of VLAN1322 with 'show spanning tree vlan 1322 detail'?

from88 · ‎03-12-2018

thank you,

here what show stp shows:

VLAN1322 is executing the rstp compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 1322, address 0062.ec1a.1905
Configured hello time 2, max age 20, forward delay 15
We are the root of the spanning tree
Topology change flag not set, detected flag not set
Number of topology changes 4 last change occurred 4:36:56 ago
from port-channel151
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15

the strange thing is that in port-channel creation i create and allow only two vlans. And the problem starts to appear on other ones. Do TCN works per port not per vlan ? Still trying to understand if it's related to TCN and STP. And yes other N9K has complete ARP entry.

We're thinking to do and upgrade, try again to create bunch of port-channels and see if it's helping.

Any other ideas ? thanks

clsulliv · ‎03-12-2018

The TCN is pointing to Po151 as the cause of this TCN but this may not be accurate if you have reloaded the devices.

Number of topology changes 4 last change occurred 4:36:56 ago from port-channel151

Depending on how you deploy the port-channel, you can create the TCN across all VLANS, i.e. you configure it as a trunk prior to pruning to specific required VLANs.

We can figure out what may be leading to this issue but we would need to have specific logs from few processes to properly figure this out. This would require us to take a look in the broken state most likely.

from88 · ‎03-12-2018

thanks.

po151 ir peerlink

interface port-channel151
description VPC Peer-Link
switchport mode trunk
switchport trunk allowed vlan 1,1301-1399,1801-1899
spanning-tree port type network
vpc peer-link

and that timer then the devices was reloaded.

Which command will help to troubleshoot more ?

from88 · ‎03-13-2018

Also, seems the reason could be this bug:

As i starting to remind. This behaviour we're experiencing gets only when configuring bunch of a port-channels at the same time e. g. 8 port-channels. When im doing this in manual way there'are no disruptions.