Etherchannel mismatch = broadcast storm?

steven.crutchley · ‎11-30-2010

Recently I made a somewhat severe mistake and collapsed our switching core. Unfortunately now that everything is restored and I have receive my 50 lashes, I need to confirm what caused the incident in the first place. Here is what I did:

We have a switching core (4 stacking 3730s)

We have our cabient 9 switch (2960) that is connected to the core by 2 trunking fibre pairs (STP stops a switching loop)

We have our cabient 8 switch (2960) that is connected to the core by 1 trunking fibre pair.

I wanted to configured LAYER 2 ETHERCHANNEL ON THE TWO LINKS GOING FROM THE CORE TO CABINET 9.

This is what I did instead...

I made a layer 3 interface port-channel (6) and did not give it an IP address (I thought I needed to create this before adding the port to the channel - which I now know is not the case for layer 2) on BOTH the core and on the cabinet 9 switches.
I added the two fibre ports on the cabinet 9 switch to channel-group 6 mode ON.
I added two of the fibre ports on the core to channel-group 6 mode ON ... the problem was, was that one of the ports went to cabinet 9 and the other went to cabinet 8 (I did not realise this until later).

Shortly after I had done this I notice that I could telnet to the cabinet 9 switch and the odd device attached to cabinet 9 ... but the majority of devices hanging off cabinet 9 were not pingable.

When my troubleshooting failed I removed the ports on cabinet 9 from the channel-group. This is when the core... and all the switches hanging off the core lost connectivty and crashed.

I disconnected the two fibre cables from the core and soon after the network restored itself (with more than a small miriad of server problems that took a few hours to fix). When I removed all etherchannels and channel-groups I plugged the fibre back in but they did not come back up. Upon looking at our core the ports have been disabled (admin down). I enabled them, plugged the ports in and the network returned to normal operation.

My guess is that I caused a spanning-tree loop that resulted in a broadcast storm but I am not sure how. Can anyone advise how a switching loop could have occured in this instance, or if this is in fact what happened?

Peter Paluch · ‎12-04-2010

Hi Steven,

A really unpleasant sequence of things. I am sorry you needed to go through all that ordeal.

Okay, let's analyze what happened.

First of all, you have created the Etherchannel using the on mode. This is the first mistake many administrators do for an inexplicable reason. The on mode forcibly puts the ports into an Etherchannel bundle without making any sanity checks if that is appropriate - whether all the ports are connected to the same device and whether the ports on the opposite device are also willing to go into Etherchannel. A common problem is created when ports on one side of an Etherchannel are bundled together using the on mode while the opposite switch is not yet configured for Etherchannel. Because the STP considers the Etherchannel as a single port, it makes all its ports either Forwarding or Blocking. That is a perfect opportunity for a Layer2 loop:

A frame is sent through a port in the Etherchannel to the opposite device. Because this Etherchannel is forwarding, all its ports are forwarding.
The opposite device does not have the Etherchannel configured yet. If the frame destination is unknown unicast/multicast/broadcast, this frame will be flooded through all other ports in the same VLAN, including the ports that would constitute the Etherchannel if it was configured.
The frames will be received back by our switch and the process may repeat.

Second, you are stating that when you removed the Etherchannels (by removing the Port-channel interfaces), the individual ports formerly grouped under them were administratively down. That is correct - whenever you remove the Port-channel interface, all its member ports are automatically shutdown by the switch to prevent just what happened to you.

What I find somewhat strange is that you write that you have created a Layer3 Etherchannel on the core. Was that by intent? The 2960, to my knowledge, does not support Layer3 routed ports, and therefore it cannot be an endpoint of a Layer3 Etherchannel.

The resume of all this is - whenever possible, use the LACP (active/passive) or PAgP (desirable/auto) to negotiate the creation of an Etherchannel. This is exactly what these two protocols were designed to do, and you will save yourself a lot of trouble.

Best regards,

Peter