07-18-2007 02:11 AM - edited 03-05-2019 05:21 PM
We have been experience some very strange issues on our network.
Switches all of a sudden started dropping off the management Vlan. On closer inspection on the switch logging buffers we saw the following message:
%PM-4-ERR_DISABLE: loopback error detected on Gi0/1, putting Gi0/1 in err-disable state
I know that is generated by the switch when it notices a loop on its keepalive messages generate by the Ethernet Configuration Test Protocol, but what I do not understand is how it works.
We did eventually find the problem. Spanning tree for Vlan 1 had been disabled on 2 distribution switches which must have caused a temporary loop on Vlan 1. We have since added the errdisable recovery cause loopback to all access switch configs.
To further add to my confusion of how the protocol works, I setup a fluke analyser on Vlan 1, started a capture, and rebooted a random switch that was trunking Vlan 1. Up until the reload of the switch I saw nothing on the Vlan apart from CDP and STP traffic, as expected. However, once the switch had reloaded I saw approximately 50 loopback ethernet frames from a handful of switches across the campus.
It is as if the reloaded switch, by generating its own keepalive messages, caused other switches to do the same, but in apparent order and not all switches, just a random selection.
To make matters even more strange, all of these loopback addresses had the same source and destination MAC address for each loopback messages, so why was I seeing this on the capture that was being run on a totally different switch when the frames were not broadcast ?
It appears that the ECTP is generating a kind of broadcast on Vlan 1.
Does anyone have any idea how this protocol works ? Or any links to any documentation as I cannot find any.
Thanks in advance.
Solved! Go to Solution.
07-18-2007 07:42 AM
The protocol sends a keepalive every 10 seconds on each of the switched port. If the port receives back the keepalive it has sent, it is shut down with the message you saw.
A bridge flood traffic with unknown destination mac addresses. That's why this frame can be temporarily flooded in the network. As soon as the source mac address of the keepalive message has been learnt by a neighboring switch, the forwarding of the keepalive through the switch will stop. This is because a switch does not forward a frame whose source mac address has been learnt on the port on which it has been received. So in a stable state, the keepalives should be contrained to the ethernet segment on which they are transmitted and should not be flooded across the whole bridged domain. When STP advertises a topology change in the network, the CAM tables are flushed and this gives another opportunity to the keepalives to be flooded, until their addresses are learnt again. That's probably what is happening when you are rebooting the switch.
Regards,
Francois
07-18-2007 07:42 AM
The protocol sends a keepalive every 10 seconds on each of the switched port. If the port receives back the keepalive it has sent, it is shut down with the message you saw.
A bridge flood traffic with unknown destination mac addresses. That's why this frame can be temporarily flooded in the network. As soon as the source mac address of the keepalive message has been learnt by a neighboring switch, the forwarding of the keepalive through the switch will stop. This is because a switch does not forward a frame whose source mac address has been learnt on the port on which it has been received. So in a stable state, the keepalives should be contrained to the ethernet segment on which they are transmitted and should not be flooded across the whole bridged domain. When STP advertises a topology change in the network, the CAM tables are flushed and this gives another opportunity to the keepalives to be flooded, until their addresses are learnt again. That's probably what is happening when you are rebooting the switch.
Regards,
Francois
07-19-2007 02:54 AM
Thanks for the prompt response francois.
So basically, because Spanning Tree had been disabled on 2 of our Distribution Switches for Vlan 1 (each with dual uplinks to the core and trunking Vlan 1) this is why access switches saw their own keep alive messages and error disabled the uplinks.
The distrubition switches would have been forwarding Vlan 1 out of both uplinks creating a loop.
Once the switches had learnt the mac addresses the flooding stopped. The loop was still there, but no traffic was being flooded across Vlan 1. Hence the fault only re-occurred when topology changes occurred.
Thanks once again, this has cleared up alot of questions.
Chris.
02-20-2014 01:16 PM
Guys, what I am not sure I understand here is following;
Switch A ------TRUNK------ Switch B -----TRUNK----- Switch C
Switch A sends a Keepalive to Switch B.... SRC and DST MAC is the same so why would SWITCH B ever unicast flood it to SWITCH C creating potential for a loop if it knows that that MAC is on interface facing SWITCH A?
Another question is -> is that per VLAN keepalive?
We had issue today when misconfigured portchannel must have created temporary loop and many switches received their loopbacks and got error-disabled.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide