Solved: Re: err-disable state SW down interface - Page 2

athan1234 · ‎08-03-2023

Hello

Hello I am attemping to install two cisco 9200 Series . The por 24 is in trunk mode is conecting against other 9200 series ( i dont have acces now to this SW this SW has a some SWitches on different port the SWitches are no cisco) .

there is not bpdu filter o

spanning-tree mode rapid-pvst
spanning-tree extend system-id
memory free low-watermark processor 10055

interface GigabitEthernet1/0/24
description conexion SW02
switchport mode trunk

Debug

debug spanning-tree config
Spanning Tree configuration debugging is on

Interface GigabitEthernet1/0/24, changed state to up

*Jul 13 15:52:59.853: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/24, changed state to up

*Jul 13 15:53:01.548: %PLATFORM_PM-3-LOOP_BACK_DETECTED: Loop-back detected on GigabitEthernet1/0/24.

*Jul 13 15:53:01.563: %PM-4-ERR_DISABLE: loopback error detected on Gi1/0/24, putting Gi1/0/24 in err-disable state
*Jul 13 15:53:02.562: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/24, changed state to down

*Jul 13 15:53:03.572: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/24, changed state to down
conf t
Enter configuration commands, one per line. End with CNTL/Z.
#int gig
#int gigabitEthernet 1/0/24
)#shu
)#shutdown
)#no shu
)#no shutdown
)#
*Jul 13 15:54:13.332: %LINK-5-CHANGED: Interface GigabitEthernet1/0/24, changed state to administratively down
)#
*Jul 13 15:54:15.518: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/24, changed state to down
)#
*Jul 13 15:54:18.236: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/24, changed state to up
)#
*Jul 13 15:54:20.250: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/24, changed state to up
)#
*Jul 13 15:54:22.365: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan211, changed state to up

athan1234 · ‎08-09-2023

My customer activated yesterday the other switch and it appears with the command: no keepalive

I've established the link, and it's working. I'm having a meeting today to figure out what's wrong. Why isn't putting no keepalive a good practice?

Peter Paluch · ‎08-09-2023

Hello Athan,

Catalyst switches send out simple frames called the LOOP frames out of every switched port. These frames are special in that their source and destination MAC addresses are set to the same value - the MAC address of the sending switchport. Essentially, every LOOP frame is a frame "from me to me". However, such frames are never expected to arrive back to their originating switchport. If a Catalyst switch receives a LOOP frame on a switchport that originated it (the source and destination MAC addresses of the frame match the MAC address of the switchport alone), it will err-disable the port with the loopback error cause - just like the one you are experiecing yourself.

In a properly working network, such frames will never come back to their sending port because any other switch that receives a LOOP frame would be forced to send it out the same port it came in (because the sender MAC = destination MAC). But no switch can send back a frame, no matter how addressed, back the very port it was received on - that is a basic operating rule of all switches.

So if a LOOP frame nonetheless comes back to its originating port, it indicates there is a looping condition in the network. It might be a switching loop involving a single switch or multiple switches. It may be a misbehaving switch or a NIC that reflects back frames on the port it receives them. It may be a poorly coded software switch that does not abide by all rules of Ethernet switching. It may be a misconfiguration (such as feeding a SPAN destination port back to the network that is being SPANned). It may be a physical layer problem looping back the frames (miswiring, damaged cable with short circuits, split fiber, ...).

Either way, an arrival of a LOOP frame back to its port of origin, and the err-disabling of the port, is a symptom of a possibly grave misbehavior of the network attached to that port.

Configuring "no keepalive" stops sending out the LOOP frames. That is why it seems to resolve your problem, but in the reality, you are just disabling the mechanism that detects the problem but the problem itself remains. It's like sweeping it under carpet and pretending it is not there - but it is there, and may come back to bite you badly.

This problem definitely requires further troubleshooting down of Gi1/0/24 of the switch that is seeing the port being err-disabled.

It might be possible that if you disable the keepalives, your Gi1/0/24 will still be blocked by STP if even STP hears back its own BPDUs - in that case, the port will be marked as Broken (BKN) in the "show spanning-tree" output.

But even if it isn't, there is no valid acceptable reason for the LOOP frames from Gi1/0/24 to come back to Gi1/0/24. Deactivating the LOOP frames with "no keepalive" under these circumstances is a ticking bomb waiting to go off.

So you are saying the Gi1/0/24 of the Cat9200 switch you were able to show us is connected to the other Cat9200 switch? How is that switch configured? How is the connection done - is it a direct UTP connection and a cable run with no devices, active or passive, on it, or are there any patch panels, patch cords, cable splices / connectors, ...?

Best regards,
Peter

Leo Laohoo · ‎08-09-2023

@Peter Paluch wrote:
Configuring "no keepalive" stops sending out the LOOP frames. That is why it seems to resolve your problem, but in the reality, you are just disabling the mechanism that detects the problem but the problem itself remains. It's like sweeping it under carpet and pretending it is not there

And because the customer did it, if something should happen, the customer can/will claim "asylum". When troubleshooting occurs, that command remain without knowing the repercussion.

athan1234 · ‎08-20-2023

@Peter PaluchSo, thank you for your post, which I found to be really useful.

Marcofbbr · ‎07-17-2025

Awesome explanation. Based on my research such loop frames are confused very often with BPDUs, but clearly they are not BPDUs.
I am facing a similar problem .
One detail is still not clear though.
You state "However, such frames are never expected to arrive back to their originating switchport." this makes sense to me but it only focuses on the "coming back" aspect . What I miss to understand is whether such frames,when received by a switch, are forwarded to other ports. Also, is the source MAC for such frame is even learned on the ingress port. My assumption is that they are supposed to be dropped on the first receiving switch but I would like a confirmation about that if possible. I have a case where a non cisco switch is potentially bridging/looping a L2 domain and that concept would help to understand whether such frames can be forwarded across multiple switches.
If my assumption is correct and such frames are dropped on the first receiving switch , then it makes no sense to talk about STP loops as I see in many posts, because such frames are not supposed to leave the receiving switch so they cannot be looped.
Thanks

Peter Paluch · ‎07-18-2025

Hello @Marcofbbr ,

Thank you!

What I miss to understand is whether such frames,when received by a switch, are forwarded to other ports.

In theory, they should not because of the following logic:

The source and destination MAC address of LOOP frames is the same.
The receiving switch learns the source MAC address on the incoming interface.
As the destination MAC address points out the incoming interface, the receiving switch has no reason to flood the frame out the other interfaces, but also cannot forward it through the incoming interface.
As a result, the LOOP frame should be blackholed by the first receiving switch.

Now, on real switches, the process of learning MAC addresses is independent - and slower - than the process of making a forwarding decision. Depending on the architecture of the switch and the ASICs, the MAC address learning may be completely autonomous by the ASICs alone (and the operating system only gets notified of newly learned MAC addresses), or it may be driven by the operating system where the ASIC reports to the operating system that there is a new MAC address detected, and the operating system then programs the MAC address across ASICs in the switch. This process of installing a new MAC address into the MAC address table is much, much slower than the process of searching through a MAC address table - by orders of magnitude.

In other words, it is very possible that there is a limited period of time when the switch starts receiving LOOP frames from a new sender - and with each of them, it immediately makes the forwarding decision and forwards the frame accordingly - but the source MAC address is not yet programmed into the MAC address table because that process takes more time. In that case, the received LOOP frames would be flooded out the remaining ports in the receiving VLAN just like any frames with an unknown unicast MAC address, and this flooding would take place until the source MAC address was programmed into the MAC address table. The programming happens roughly in the order of milliseconds, so till that time, any received LOOP frame from the new sender would still be flooded by the switch. After the programming is done, of course, the flooding would stop, and the LOOP frames would start getting dropped because they cannot be forwarded out the interface they arrived on.

So the LOOP frames may be temporarily flooded out other ports when received. It is a valid transient phenomenon. It should not take longer than a few milliseconds, though, to finalize the MAC address programming and stop the flooding condition.

Also, is the source MAC for such frame is even learned on the ingress port.

Yes, it is. There is no reason why it should not be. In fact, it is this very learning that stops the flooding of LOOP frames.

If my assumption is correct and such frames are dropped on the first receiving switch , then it makes no sense to talk about STP loops as I see in many posts, because such frames are not supposed to leave the receiving switch so they cannot be looped.

With correctly working switches, Cisco or non-Cisco, LOOP frames should ultimately get dropped by the first receiving switch because of the same source and destination MAC address, and the fundamental switching rule that a frame is never forwarded back through its ingress port. There may be short (milliseconds, tens of milliseconds) periods of time after the MAC address table is flushed or a new LOOP frame sender appears where these LOOP frames are flooded just like any other frame with an unknown destination.

However, even if the LOOP frames were continuously flooded by a switch (flooding here means exactly "upon receiving a frame to be flooded, replicate it out all remaining ports in the same VLAN and terminate"), this cannot result into a continuous forwarding loops, broadcast storms, and/or err-disabling the ports. Once again, flooding LOOP frames is equivalent to flooding any BUM (Broadcast, Unknown Unicast, Multicast) traffic. LOOP frames are not any special in this aspect. If flooding them causes ports to get err-disabled, there is something fundamentally broken with the network itself, and the LOOP frames are not causing it. They're merely detecting it.

Please feel welcome to ask further!

Best regards,
Peter

marx82 · ‎07-18-2025

Thanks again for the clarification. Writing from another account but it is still me.

it is very possible that there is a limited period of time when the switch starts receiving LOOP frames from a new sender - and with each of them, it immediately makes the forwarding decision and forwards the frame accordingly - but the source MAC address is not yet programmed into the MAC address table because that process takes more time.

I really hope what you described above is not the case (unless of software defect). I would expect STP would prevent such "leak" by placing the port in blocking status but I also understand that we are talking about a special frame/protocol and I find the behavior a bit unpredictable just by following logic and common sense. I would also expect that such frames are threated as one hop/segment frames by the devices.

One of the hypothesis concerning my problem is that such LOOP frames are indeed forwarded through the looped L2 infrastructure (looped by design) and reach back the originating switch from another port (causing this 2nd port to be disabled). So what you wrote is somehow on the same line and it is a bit scary to be honest.
In my case the problem is very sporadic though and the trigger is still unknown.

Unfortunately, as there is is apparently no detailed documentation, I think the "final " answer here is to reproduce the problem....this will be very time consuming. Perhaps opening a TAC case could help as well.

Peter Paluch · ‎07-18-2025

Hello again : )

I really hope what you described above is not the case (unless of software defect).

I am sorry to disappoint you but what I described is a normal, logical, necessary part of any managed Ethernet switch operation. This transient flooding is both inevitable and measurable; it is trivial to reproduce on a switch using a custom frame generator and a packet sniffer.

Keep in mind: Upon receiving a frame with a new source MAC, the switch cannot wait till the ASIC or the operating system finished the programming of the MAC address table, possibly across many ASICs and across multiple linecards. It needs to get rid of the frame as soon as possible. So it does the best it can, given the current contents of the MAC address table, and performs the MAC learning in parallel to frame forwarding. And as I explained already, MAC address learning is orders of magnitude slower than forwarding lookups and frame forwarding, and hence the initial flooding is inevitable by the very principle. This is neither a bug, nor an imperfection - it is a rule given by the technical nature of the ASICs, and has always been around.

I would expect STP would prevent such "leak" by placing the port in blocking status

STP has nothing to do with flooding, neither with preventing it. STP builds a loop-free topology, nothing more. But even a loop-free topology must still perform flooding because that is what the proper delivery of BUM traffic (broadcast, unknown unicast, multicast) depends on.

Flooding is replicating exactly one copy of a received frame out every eligible outgoing port. On a single switch, flooding is not a chaotic, disorganized, endless process - on the contrary, it is very organized and controlled: A single incoming frame will be replicated and sent out exactly once through every proper outgoing port, and then the whole process for the given frame ends, period. As long as the switch does not receive the same frame again (which is where STP comes into play by eliminating loops from the topology where frames could flow around in circles), this flooding is a clearly delineated, delimited and finite process and has nothing to do with the dramatic network meltdown scenarios commonly associated with "flooding" and endless frame looping.

but I also understand that we are talking about a special frame/protocol and I find the behavior a bit unpredictable just by following logic and common sense.

Actually, I have strongly emphasized before, and I am strongly emphasizing again, that the LOOP frames are not special in any way. The only "weird" thing about them is that their source and destination MAC address is the same unicast MAC address. To any switch, Cisco or other vendor, they are first and foremost completely routine Ethernet frames in the Ethernet II (DIX) format and there is nothing extraordinary about their framing. Switches that don't support this specific protocol simply ignore the payload identified by the EtherType 0x9000, just like they ignore dozens if not hundreds of other unrecognized protocols encapsulated in Ethernet frames.

Don't treat these frames specially. There is nothing special about them. Their handling by switches, Cisco and others vendors, can be analyzed and derived by purely applying fundamental Ethernet switching rules. Anything more would be trying to invent fancy rules and exceptions where none exist nor are needed.

One of the hypothesis concerning my problem is that such LOOP frames are indeed forwarded through the looped L2 infrastructure (looped by design) and reach back the originating switch from another port (causing this 2nd port to be disabled).

This cannot be the case. A port on a Cisco Catalyst switch gets err-disabled only if it receives its own LOOP frame. If the port receives a LOOP frame originated by a different port, even of the same switch, nothing happens. This is trivial to demonstrate - simply connect two ports on the same Catalyst switch together and see what happens. STP will block one of the ports but that does not prevent any of them to continue sending and receiving LOOP frames because they're L2 control-plane traffic and that continues being processed even on ports in STP Blocking/Discarding state. There will be no err-disabling. Alternatively, if you want to be 100% certain, you can disable STP in the VLAN of the two ports; you still won't end up in either of the ports err-disabled.

Hence, if you see a port getting err-disabled because of detecting a looped condition, it must be because the port received its own LOOP frame it sent out at some point in the past. How that could have happened is the million dollar question here, but I can see only two possible explanations: Either you truly have a switching loop in progress, perhaps even a transient one, or the non-Cisco switches in your network violate the fundamental rule of Ethernet switching: Never forward a frame back through the interface it arrived on.

As always, feel welcome to ask / comment further!

Best regards,
Peter