10-20-2008 08:24 AM - edited 03-06-2019 02:02 AM
Looking for some troubleshooting feedback on this one.
In a single vtp domain environment containing about 120 switches (mostly Cat 3560s and very few DLink Des3550s) I've recently started to see a few network wide connectivity drops, very short in nature but totally unacceptable either way. The fact that everything is configured default pvst made me wonder if the short downtime was a STP recalculation + the network converging again. Very soon we'll be looking at moving layer 3 out to each network closet but in the mean time I want to find the culprit with the current setup. Syslogs aren't showing anything concrete and I don't see any ST inconsistencies from my root bridge. Anyone have a few tricks to track down these issues?
Thanks in advance,
Jim
Solved! Go to Solution.
10-23-2008 12:28 AM
I agree you need to track where the address *should* be.
I would also sugest that if you don't already use them, use BPDU-Guard and port security. Port security can be used to effectively restrict a user port to a single mac address - ie a user puts a hub there to connect a second PC and only one will work, BPDU Guard should be used on all edge ports of your network. What that does is protect your network against someone plugging a real switch in that will send BPDUs. The effect would normally be that of someone plugs a switch in, the port shuts down and the user then has to ask for the port to be re enabled, giving you the opportunity to educate them about the issues of connecting unauthorised network devices to the network!
Paul.
10-20-2008 08:55 AM
Hi Jim,
On Cisco switches you can use the "debug spanning-tree" and/or "debug spanning-tree events" and similar debug commands to get more information on what is happening exactly.
"debug spanning-tree " has many options. Use the "?" after this command and you will see.
Cheers:
Istvan
10-20-2008 11:19 AM
I've used the "events" and "all" variations of the debug spanning-tree command before but in a much smaller environment. I've also seen large scale debugging cripple devices so I am somewhat reluctant to do so during production hours.
10-20-2008 11:32 AM
Hi Jim,
Your point is right.
In your place I would first try this debugging out of working hours while logging the debug outputs onto a syslog server so you can examine them later.
If this phenomenon happens only during working hours, then you can do debugging on a switch that is less important or has less users (not on the root switch for example).
But my supposition is that a debug spanning-tree events command alone should not crash a switch, unless its processor is already overwhelmed with traffic/processes that you don't know about.
Cheers:
Istvan
10-20-2008 11:44 AM
Thanks,
I'll go ahead as planned after hours and post results.
10-21-2008 02:11 AM
Pick a couple of VLANs that are affected and do sh spann vlan
Number of topology changes 152 last change occurred 18:18:00 ago
from GigabitEthernet1/1
If the time is similar to the time since the last incident, spanning tree was affected - spanning tree may or may not be the culprit though! If you are looking at it 5 mins after an incident,and the last topology change was two months ago then spanning tree i just fine.
This does depend a little onyour network being correctly configured -if you haveuser ports not set as portfast, every tine the port comes up all switches between the port and the root will see a topology change.
10-22-2008 01:54 PM
So another outage...
I started checking stp on individual vlans
I got similar results on numerous switches for a specific vlan:
Example of "show spanning-tree vlan 201 detail" three hours after the outage.
#################################
VLAN0201 is executing the ieee compatible Spanning Tree protocol
Bridge Identifier has priority 32768, sysid 201, address 001f.260c.6480
Configured hello time 2, max age 20, forward delay 15
Current root has priority 32969, address 000a.b89b.3100
Root port is 1 (GigabitEthernet0/1), cost of root path is 8
Topology change flag not set, detected flag not set
Number of topology changes 122 last change occurred 03:04:52 ago
from GigabitEthernet0/2
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0, aging 300
#########################################
The topology change lines up very closely with the outage for the majority of our switches. Not all were hit.
We also run ZenOss as a monitoring software and it reported about 400 instances of this shortly after the outage.
##########################
Host 001a.a0bd.23c6 in vlan 201 is flapping between port Gi0/2 and port Gi0/1
Host 00e0.18ba.bc5d in vlan 201 is flapping between port Gi0/1 and port Fa0/4
##########################
Flaps happened between trunk links multiple times with the same five minute span, all tied to a single vlan.
Still at a loss as to what exactly is causing this. I'll be setting up STP event debugging tomorrow.
10-22-2008 02:51 PM
This can be caused by someone looping the data cable between 2 different ports on the same switch or someone that has one of those nice hidden home routers under their desk and then they loop between the ports on that . These are always the hardest to find . Are those ports user ports or uplinks ? If user I would take a very close look at the whats on those ports and not just in the closet also out on the floor . track those mac addresses and see what port they are actually on when the network is quiet.
10-23-2008 12:28 AM
I agree you need to track where the address *should* be.
I would also sugest that if you don't already use them, use BPDU-Guard and port security. Port security can be used to effectively restrict a user port to a single mac address - ie a user puts a hub there to connect a second PC and only one will work, BPDU Guard should be used on all edge ports of your network. What that does is protect your network against someone plugging a real switch in that will send BPDUs. The effect would normally be that of someone plugs a switch in, the port shuts down and the user then has to ask for the port to be re enabled, giving you the opportunity to educate them about the issues of connecting unauthorised network devices to the network!
Paul.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide