Spanning Tree Loop Troubleshooting and Safeguards

James D Hensley · ‎12-03-2010

Problem Description:

A bridging loop or spanning tree loop caused a network outage. To break the loop you've pulled one of the redundant links or shut down one of the switches that are participating in the loop but now you're unsure of what to do to both find the source of the loop and prevent it from occurring again.

Action Plan:
Prior to bringing the redundant link/switch back online, implement Layer 2 safeguards designed to protect against STP loops and mitigate the impact if one does occur.

1) Implement Spanning Tree PortFast and BPDUGuard on all edge ports

2) Verify that currently the proper switch is STP root for all VLANs. Consider enabling root guard on root/core switch uplink ports to the distribution layer switches to ensure your root bridge does not change unexpectedly (such as when new switches are connected to the network). It can also be enabled at the access layer, rather than on the root bridge(s), if you maintain control of the distrubution layer and are not concerned with anyone making changes or adding switches to the distribution layer.

Below is an excellent doc that details root guard. See the section titled "What Is the Difference Between STP BPDU Guard and STP Root Guard?" for clarification on the difference. You do not want root guard on the port-channel between core switches running HSRP. It should be enabled ONLY on the uplinks to other switches (or access ports) that you do NOT want to become spanning tree root.

http://www.cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a00800ae96b.shtml

3) Enable loop guard on all distribution/access layer switches*
4) Enable BPDU guard on all distribution/access layer switches*
5) Enable UDLD on all fiber uplinks*
- Unidirectional links can cause spanning tree loops. UDLD will prevent this by shutting down a unidirectional link. Note that in Some NX-OS vPC environments UDLD in Aggressive mode is NOT recommended. See http://tools.ietf.org/html/rfc5171#section-5.4 for the IEEE definintion on the difference between "Normal" and "Aggressive"

6) Prune unnecessary VLANs off your trunks

After implementing root guard, loop guard, UDLD aggressive, and BPDU guard, bring the link/switch back up and see if the loop reforms.

* Prior to implementing any of these features it is recommended that you Familiarize yourself with how each feature works:

IF THE LOOP REFORMS:
1) Have a TAC engineer online to troubleshoot

2) Enable mac-address move notification (if applicable - this is disabled by default on the 6500/7600 platform and enabled by default on most others - including the 3750/3560/2960 platforms)

 ITLABSW#(config)#mac-address-table notification mac-move

Check the switch log for mac's flapping between interfaces. These are the ports that are participating in the loop. Trace the MAC back to its source. Look for:
- A link flapping on a upstream switch, causing spanning tree TCNs (topology change notifications) and spanning tree reconvergence. This should be used in conjunction with step 3 below.
- A unidirectional link on an upstream switch causing the loop.
- A hub or switch connected to a portfast enabled access port where this mac is learned. Shut this port down and see if this breaks the loop.

3) Check for TCNs
While the loop is occurring, if you see excessive TCNs you need to trace the TCNs to the source . To do this, start from the core and run the following commands.

 
ITLABSW#show spanning-tree detail | inc ieee|occurr|from|is exec

The output from this command will show you the port the last TCN was received on and the time which it was received.  
Look for the port that  received a TCN in the last few seconds.

 ITLABSW#sh spanning-tree detail | i ieee|occur|from|is exec
   VLAN0001 is executing the rstp compatible Spanning Tree protocol
     Number of topology changes 187927 last change occurred 00:01 ago <-time rec'd
         from Port-Channel12 <--interface that received the TCN

You will want to follow this port until the port that receives the TCN is an access port, or until the switch in question is generating TCNs but not receiving them. If you find an access port receiving TCNs, shut it down and see if that stabilizes the network.

If you find a switch generating TCNs, you will want to look for two uplink ports or trunks in a spanning tree forwarding state for the same VLAN. If you find two ports in a forwarding state, shut one port down and see if this breaks the loop. Check for a unidirectional link or excessive link flaps.

4) look for an interface with a very high input rate and low output rate

 ITLABSW#sh int | i is up|rate

When a bridging loop occurs you will usually see multiple interfaces with a high output rate and low input rate and a single interface with a high input rate and low output rate.
- Trace the port with the high input rate down until you come to an access port and shut it down
- If the port with the high input rate leads you into a loop you will want to check spanning tree and Etherchannel states until you either find a switch that has a port in an incorrect forwarding state or incorrectly bundled.

5) Look for packets hitting the CPU. Sniff the CPU and see if the packets share a common source (this is only an option on certain platforms. You'll need to contact TAC to assist with setting it up and analyzing the data). Track down the source. If they are STP or CDP packets (or packets destined to the 0100.0CCC.CCCX reserved multicast address) trace where the source mac is learned. See if the source mac leads you in a loop.

If you see two ports in a forwarding state for the same VLAN on the same switch, we need to look for the following:
a) does this switch think he is the root for this VLAN (or vlans)?
b) should he be?
c) is he receiving BPDUs from his neighbor on the ports in a forwarding state? (sniff both forwarding ports to look for BPDUs)
d) look for a unidirectional link on one of the ports in a forwarding state
e) shut one of the ports in a forwarding state and see if the loop stops

Good doc for troubleshooting bridging loops.

http://www.cisco.com/en/US/tech/tk389/tk621/technologies_tech_note09186a00800951ac.shtml#brid_loop

acui · ‎01-31-2011

For avoid excessive TCN, how about we recommend add "spanning-tree portfast" for most access port connect to PC?

sdheer · ‎01-31-2011

Hi,

Portfast is always recommended on access ports connected to the end hosts .Therefore whatever you suggest is absolutely correct.

Regards,

Swati

martlee2 · ‎09-22-2016

which command can show for searching this

"if you see two ports in a forwarding state for the same VLAN on the same switch"

James D Hensley · ‎12-19-2016

"show span vlan <#> detail" will give you all the ports forwarding for the vlan. The key here is in knowing the spanning tree topology well enough to identify a port that should be blocking but is not. There should be one root port and multiple designated ports in a forwarding state.

In a redundant topology there should be one blocking port to prevent a loop. To identify that port you must understand stp path selection. See:

STP: http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/10556-16.html for

RSTP: http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/24062-146.html

I have seen very RARE cases where a blocked port is actually forwarding traffic. Once you identify which port should be blocking and confirm it is blocking via the "show span vlan <#> detail" command you can check the interface stats with "show interface <#>" and look at input / output rates. Note if it is a trunk and not all vlans are blocked then this command will not be very useful.

David Martinez Garcia · ‎01-14-2019

Hi all.

What happens if the output for:

 ITLABSW#sh spanning-tree detail | i ieee|occur|from|is exec
   VLAN0001 is executing the rstp compatible Spanning Tree protocol
     Number of topology changes 187927 last change occurred 00:01 ago <-time rec'd
         from Port-Channel12 <--interface that received the TCN

Shows no interface, like int the follwing example:

VLAN0003 is executing the ieee compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 3, address 3820.566a.6500
Configured hello time 2, max age 20, forward delay 15
Current root has priority 8192, address 64f6.9d7e.77fd
Root port is 52 (GigabitEthernet0/4), cost of root path is 8
Topology change flag not set, detected flag not set
Number of topology changes 25311 last change occurred 1d11h ago
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0, aging 300

Thanks!

Wasim Chandel. · ‎08-04-2020

Generally, when a blocked port transitions into forwarding; for instance due to a unidirectional link, a TCN will be received from that particular port on the switch instead of the root port.

You can use the "show spanning tree vlan <id> detail" command for details.