Access switch freezed and spanning tree loop

gdspa · ‎02-20-2024

Hi all,

I've experienced, recently and in the past, loop problems in my enterprise network, related to an access switch freezed. We still have access switches not connected through etherchannel to the distribution switches, which are in stack, so spanning tree is normally blocking the loop on one of the access switches of the closets.

From what I read and understood, loop guard and udld guard are useful in case of a link issue, but I'm not sure they could prevent loop if an access switch freezes.

Any suggestion to restrict effect of loop in this situation?

MHM Cisco World · ‎02-20-2024

You config then mode of port channel ON?

MHM

gdspa · ‎02-20-2024

Sorry, not sure about your question.

A part of our closets do not have etherchannel on uplink, because we have 2 switches not stacked and not stackable, so spanning tree is blocking one uplink. In some other case we have stacks, but now 1 uplink is 10Gbit and 1 is 1Gbit, so we cannot activate etherchannel. Of course we are renewing access switches to have stacks and activate etherchannels, but in the meantime we have a mixed configuration.

pieterh · ‎02-20-2024

if a switch freezes, I expect it NOT to forward any packets, and so cannot cause a loop ?
so there must be some other misconfiguration on your network.
(or you and i think differently when talking about "freezes")

check what is the current active spanning-tree root switch and what spanning-tree method you are using
- you may need to reconfigure spanning-tree priority so your core-switch is the root
you may find, one of the "freezing switches" currently is the root of your spanning-tree
so when a freeze occurs, a new root needs to be elected, and spanning-tree timers kick in.

- you may profit from switching to rapid spanning-tree to reduce/limit the effect of a "listening period" when a topology change occurs
- you may profit from enabling some error disable ..... functionality on your uplink switch, to disable the interface to a faulty switch, when a problem occurs.
and thus keep the error contained to the faulty switch and not influence the rest of your network

MHM Cisco World · ‎02-20-2024

can you check these notes
and reply to it
MHM

Dustin Anderson · ‎02-20-2024

We use to be set up similar to this, and had the same issues. One thing cisco suggested to us is to add storm control to links, but I can't guarantee this will help or not. I know you mention the switch freezing, but more than likely something causes the block to open and the freeze is the switch getting overwhelmed by all the packets.

It took this exact scenario multiple times before we could get management to approve getting single-mode fiber run form each core to each switch to get off spanning tree and to port channels.

One trigger we had was VTP triggering it to open, so we have disabled it and manually trunk the needed vlans.

storm-control multicast level 5
storm-control broadcast level 5
storm-control action shutdown

gdspa · ‎02-26-2024

Thanks all for suggestions.

With switch freezed I mean fiber uplinks are connected but switch doesn't work correctly and loops starts. Of course it could be a different sequence, as written by @Dustin Anderson : blocked port is unblocked, loop starts and switch freezes.

I've checked and root is our core for all vlans.

In the past we already configured

storm-control broadcast level 5

on uplinks on distribution switches. Maybe we could add storm-control multicast level 5 and storm-control action shutdown to try to isolate the closet where problem arises.

MHM Cisco World · ‎02-26-2024

Now' the root is point to Core that OK'

The issue then can more troubleshooting by

1-Show spanning tree interface <link toward Core and link interconnect both access SW>

There is send and receive counters

The link to core recieve counts must increase the send must stay same

The link between two access SW' one SW must it send counter increase and other SW must receive counter increase not both.

do show wait 10 min then do show again

This give us hint that bpdu is send and receive between SW and there is no unidirectional issue of fibers cable(this can also check by udld)

2- if step above correct and send receive as we accepted then we need to check if high cpu is happened when stp blk port change to fwd? Why the bpdu is punt to CPU to analyze it. If cpu is busy the bpdu is missing and drop and the stp will make blk port be fwd and this make issue more worse.

As I suggested before do eem to make you notic if cpu high utilize happened in same time with stp issue and l2 loop.

MHM