11-30-2010 06:17 PM - edited 03-04-2019 10:37 AM
Recently I made a somewhat severe mistake and collapsed our switching core. Unfortunately now that everything is restored and I have receive my 50 lashes, I need to confirm what caused the incident in the first place. Here is what I did:
We have a switching core (4 stacking 3730s)
We have our cabient 9 switch (2960) that is connected to the core by 2 trunking fibre pairs (STP stops a switching loop)
We have our cabient 8 switch (2960) that is connected to the core by 1 trunking fibre pair.
I wanted to configured LAYER 2 ETHERCHANNEL ON THE TWO LINKS GOING FROM THE CORE TO CABINET 9.
This is what I did instead...
Shortly after I had done this I notice that I could telnet to the cabinet 9 switch and the odd device attached to cabinet 9 ... but the majority of devices hanging off cabinet 9 were not pingable.
When my troubleshooting failed I removed the ports on cabinet 9 from the channel-group. This is when the core... and all the switches hanging off the core lost connectivty and crashed.
I disconnected the two fibre cables from the core and soon after the network restored itself (with more than a small miriad of server problems that took a few hours to fix). When I removed all etherchannels and channel-groups I plugged the fibre back in but they did not come back up. Upon looking at our core the ports have been disabled (admin down). I enabled them, plugged the ports in and the network returned to normal operation.
My guess is that I caused a spanning-tree loop that resulted in a broadcast storm but I am not sure how. Can anyone advise how a switching loop could have occured in this instance, or if this is in fact what happened?
12-04-2010 02:53 PM
Hi Steven,
A really unpleasant sequence of things. I am sorry you needed to go through all that ordeal.
Okay, let's analyze what happened.
First of all, you have created the Etherchannel using the on mode. This is the first mistake many administrators do for an inexplicable reason. The on mode forcibly puts the ports into an Etherchannel bundle without making any sanity checks if that is appropriate - whether all the ports are connected to the same device and whether the ports on the opposite device are also willing to go into Etherchannel. A common problem is created when ports on one side of an Etherchannel are bundled together using the on mode while the opposite switch is not yet configured for Etherchannel. Because the STP considers the Etherchannel as a single port, it makes all its ports either Forwarding or Blocking. That is a perfect opportunity for a Layer2 loop:
Second, you are stating that when you removed the Etherchannels (by removing the Port-channel interfaces), the individual ports formerly grouped under them were administratively down. That is correct - whenever you remove the Port-channel interface, all its member ports are automatically shutdown by the switch to prevent just what happened to you.
What I find somewhat strange is that you write that you have created a Layer3 Etherchannel on the core. Was that by intent? The 2960, to my knowledge, does not support Layer3 routed ports, and therefore it cannot be an endpoint of a Layer3 Etherchannel.
The resume of all this is - whenever possible, use the LACP (active/passive) or PAgP (desirable/auto) to negotiate the creation of an Etherchannel. This is exactly what these two protocols were designed to do, and you will save yourself a lot of trouble.
Best regards,
Peter
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide