A bridging loop or spanning tree loop caused a network outage. To break the loop you've pulled one of the redundant links or shut down one of the switches that are participating in the loop but now you're unsure of what to do to both find the source of the loop and prevent it from occurring again.
Action Plan: Prior to bringing the redundant link/switch back online, implement Layer 2 safeguards designed to protect against STP loops and mitigate the impact if one does occur.
2) Verify that currently the proper switch is STP root for all VLANs. Consider enabling root guard on root/core switch uplink ports to the distribution layer switches to ensure your root bridge does not change unexpectedly (such as when new switches are connected to the network). It can also be enabled at the access layer, rather than on the root bridge(s), if you maintain control of the distrubution layer and are not concerned with anyone making changes or adding switches to the distribution layer.
Below is an excellent doc that details root guard. See the section titled "What Is the Difference Between STP BPDU Guard and STP Root Guard?" for clarification on the difference. You do not want root guard on the port-channel between core switches running HSRP. It should be enabled ONLY on the uplinks to other switches (or access ports) that you do NOT want to become spanning tree root.
3) Enable loop guard on all distribution/access layer switches* 4) Enable BPDU guard on all distribution/access layer switches* 5) Enable UDLD on all fiber uplinks* - Unidirectional links can cause spanning tree loops. UDLD will prevent this by shutting down a unidirectional link. Note that in Some NX-OS vPC environments UDLD in Aggressive mode is NOT recommended. See http://tools.ietf.org/html/rfc5171#section-5.4 for the IEEE definintion on the difference between "Normal" and "Aggressive"
6) Prune unnecessary VLANs off your trunks
After implementing root guard, loop guard, UDLD aggressive, and BPDU guard, bring the link/switch back up and see if the loop reforms.
* Prior to implementing any of these features it is recommended that you Familiarize yourself with how each feature works:
Check the switch log for mac's flapping between interfaces. These are the ports that are participating in the loop. Trace the MAC back to its source. Look for: - A link flapping on a upstream switch, causing spanning tree TCNs (topology change notifications) and spanning tree reconvergence. This should be used in conjunction with step 3 below. - A unidirectional link on an upstream switch causing the loop. - A hub or switch connected to a portfast enabled access port where this mac is learned. Shut this port down and see if this breaks the loop.
3) Check for TCNs While the loop is occurring, if you see excessive TCNs you need to trace the TCNs to the source . To do this, start from the core and run the following commands.
ITLABSW#show spanning-tree detail | inc ieee|occurr|from|is exec
The output from this command will show you the port the last TCN was received on and the time which it was received.
Look for the port that received a TCN in the last few seconds.
ITLABSW#sh spanning-tree detail | i ieee|occur|from|is exec
VLAN0001 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 187927 last change occurred 00:01 ago <-time rec'd
from Port-Channel12 <--interface that received the TCN
You will want to follow this port until the port that receives the TCN is an access port, or until the switch in question is generating TCNs but not receiving them. If you find an access port receiving TCNs, shut it down and see if that stabilizes the network.
If you find a switch generating TCNs, you will want to look for two uplink ports or trunks in a spanning tree forwarding state for the same VLAN. If you find two ports in a forwarding state, shut one port down and see if this breaks the loop. Check for a unidirectional link or excessive link flaps.
4) look for an interface with a very high input rate and low output rate
ITLABSW#sh int | i is up|rate
When a bridging loop occurs you will usually see multiple interfaces with a high output rate and low input rate and a single interface with a high input rate and low output rate. - Trace the port with the high input rate down until you come to an access port and shut it down - If the port with the high input rate leads you into a loop you will want to check spanning tree and Etherchannel states until you either find a switch that has a port in an incorrect forwarding state or incorrectly bundled.
5) Look for packets hitting the CPU. Sniff the CPU and see if the packets share a common source (this is only an option on certain platforms. You'll need to contact TAC to assist with setting it up and analyzing the data). Track down the source. If they are STP or CDP packets (or packets destined to the 0100.0CCC.CCCX reserved multicast address) trace where the source mac is learned. See if the source mac leads you in a loop.
If you see two ports in a forwarding state for the same VLAN on the same switch, we need to look for the following: a) does this switch think he is the root for this VLAN (or vlans)? b) should he be? c) is he receiving BPDUs from his neighbor on the ports in a forwarding state? (sniff both forwarding ports to look for BPDUs) d) look for a unidirectional link on one of the ports in a forwarding state e) shut one of the ports in a forwarding state and see if the loop stops
Hi, I have a Cisco ASR 9010 Router and I want to configure BGP BlackHole.I want to know about the syntax that is needed to be applied in order to do this, because I am having some issues regarding BH application in IOS XR.Lets say that I want to appl...
I have a Cisco C2960 in production which was compromised. The switch has been pulled and I'm trying to determine if the IOS image was tampered with. Since the switch is End-of-Life / End-of-Support, I'm unable to get from Cisco what the hash is to va...
Hello, We have Cisco 2960X with two etherchannels (2 ports each). One is static (mode on) to ESXi host and other active to another switch. On ESXi host load balancing policy is set to src-dst-ip (called IP hash) and it works as I see traffic on both ...
1)May I know the difference of ISIS metric-type internal and external while redistributing ? 2)I have configured metric-type external alone without metric keyword while redistributing but it is not effecting. May I know the reason why does not I...
Hi Team,I have Configured NAT64 on an ASR. It appears everything is working as required until the outgoing DNS64 Addressed IPv6 packets 'hit the NAT64 server (ASR)' on Int GE0/0/0. Int GE0/0/3.31211 is connected to the IPv6 only host. Keen to hear what th...