VPC peer-switch enabled; still getting STP topology change when one of peers fail

tim-armstrong · ‎09-16-2019

I have two 5672UP Nexus switches setup as VPC peers, both running NX-OS 7.3(5)N1(1). I have enabled the peer-switch command and equal spanning tree priorities for all vlans that support the VPC members. I have also confirmed that all downstream VPC member switches do in fact show the VPC system MAC address as the root bridge ID, not the individual physical MAC of either of the peer switches.

So based on the above, it appears that everything is configured and operating correctly, except for the important fact that when I simulate a VPC peer switch failure on one of the two peer switches (operational primary or operational secondary), I am getting a topology change notification (TCN) which is causing a two second disruption to traffic flow down to the VPC member switches, instead of the sub-second disruption i was expecting?

My understanding was that with the peer-switch command enabled and the VPC peer VPC domain configuration, VPC member switches would only seen a single root bridge (which i am seeing when I do a show spanning-tree on the member switch which is correctly showing the root Bridge ID is the VPC system MAC) that I would not experience this traffic disruption.

Anyone have any ideas as to why i might still be seeing 2 seconds of traffic disruption based on above? Anything I can check that might lead to a soluition? Why is this still happening? Is there something I am missing?

Thanks,

Tim

Reza Sharifi · ‎09-16-2019

Tim,

Not sure if you can reach sub-second convergence with peer-switch command. Looking at different documents, they all reference "improve convergence" but no document mention any sub-second convergence time. Here is one document:

vPC Peer Switch

The vPC peer switch feature addresses performance concerns around STP convergence. This feature allows a pair of Cisco Nexus devices to appear as a single STP root in the Layer 2 topology. This feature eliminates the need to pin the STP root to the vPC primary switch and improves vPC convergence if the vPC primary switch fails.

To avoid loops, the vPC peer link is excluded from the STP computation. In vPC peer switch mode, STP BPDUs are sent from both vPC peer devices to avoid issues related to STP BPDU timeout on the downstream switches, which can cause traffic disruption.

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/interfaces/521_N11/b_5k_Interfaces_Config_Guide_Release_521N11/b_5k_Interfaces_Config_Guide_Release_521N11_chapter_0101.html

So, 2 seconds maybe as good as it gets.

HTH

tim-armstrong · ‎09-19-2019

Thanks for your response Reza. I have additional data from testing that I wanted to provide to see if it might spark some ideas as to what is going in and why traffic is being disrupted for about 1.4 seconds. as described below, packet loss NOT occuring when the TCN occurs, but when the VPC member ports go active on the operational secondary about 45 seconds after the TCN occurs right when the VPC peer link recovers (active).

Here are the details of the testing.

I simulate peer link failure by shutting down peer link port channel.

Immediately, I get very minor packet loss occurs for 20ms. Not an issue for my environment. I do not get a TCN as this point.

I wait for a couple minutes to ensure all things stable and then re-enable to the peer-link. about 13 seconds later, the Peer-link is up and I immediately receive a TCN, but NO PACKET LOSS. So the packet loss is not related to the TCN.

About 45 seconds after the peer-link came up, the vpc member ports on the operational secondary peer activate. It is at this moment I am getting consistently getting 1.4 seconds of traffic disruption. There is NOT an additional TCN at this point.

Does this make sense? Any ideas on what is occurring that is causing the packet loss when then VPC member ports go active after recovery? And especially any ideas on how to reduce or eliminate this?

govi · ‎09-20-2022

Hi Tim,

I am experiencing the same issue, just wondering if you had managed to reduce the packet loss when the vPC's come back up?

Thanks

MHM Cisco World · ‎09-21-2022

do you solve this issue ?