Ether-Connect - LoopGuard Errors

Olivier Jessel · ‎02-08-2011

Hi,

I'm facing random issue on a switch port connected on an 55Mbps Etherconnect line. (layer2 site to site)

Here is the topology:

SW1 running rapid-pvst SW3 running MST (non Cisco switch)

Port in Access mode ----------L2----------- | Port1 in Access mode

| |

SW2 running rapid-pvst |

Port in Access mode --------backup L2 link---- | Port2 in Access mode

SW1 is connected to SW2 with a trunk.

Rapid-pvst has blocked (Alt) the trunk link between SW1 and SW2 as the root brigde for this vlan is SW3.

The problem is I get Loopgard errors on the SW1's port connected on the Etherconnect line. It seems that it occurs only when lot's of traffic is sent from SW3 to SW1. It's always really short, port is bloked and unblocked quickly, several times, and then no more errors for 2-3 days.

I was wondering if it's possible that we are losing BPDUs ??? In Cacti graph, we have around 15 Mbps of traffic, so it's pretty low, but maybe during a burst, we could have a short congestion.

Does anyone already have such issue or knows how to fix it ? Flow-control (receive mode) is activated. Should I configure QoS shaping ??

Thanks in advance for any help,

Best Regards,

Olivier

CCIE #44658

Nathan Spitzer · ‎02-08-2011

To start with, from the loopguard documentation: "An STP loop is created when an STP blocking port in a redundant topology erroneously transitions to the forwarding state. This usually happens because one of the ports of a physically redundant topology (not necessarily the STP blocking port) no longer receives STP BPDUs. In its operation, STP relies on continuous reception or transmission of BPDUs based on the port role. The designated port transmits BPDUs, and the non-designated port receives BPDUs."

What is most likely happening is BPDU's are getting dropped. Tools like Cacti dont have enough granularity to show the spikes that could be causing them to be dropped. Looking at your diagram a couple of things come to mind:

I PERSONALLY don't ever connect switches together with access ports. To many things can go wrong. If you truly have a flat network as you describe then making the connections trunks won't hurt things and prevents stupidness that can cause BIG headaches (see portfast below)
If you MUST have them access ports MAKE BLOODY SURE PORTFAST ISNT ENABLED!!!! If you have the "spanning-tree portfast default" command under spanning tree (normally a good practice) then ensure that spanning-tree is explicitly disable on all the interfaces in question. If you have portfast on those interfaces you could have all kinds of issues.
Ensure QoS is enabled. In this case simply turning it on may be enough, but I would also enable the priority queue if your switches support it.
If this was my network I would be running MST on the Ciscos. Even though it is more complex then RPVST+, if you want to have a mixed environment its the best way to go.

My 2 cents.

Nathan Spitzer

Sr. Network Communications Analyst

Lockheed Martin

Olivier Jessel · ‎02-08-2011

Hi Nathan,

Thanks for your reply. I made this design because the SW3 is in fact not a part of our company network, it belongs to one of our customers.

His network is directly connected to an hosting Vlan on our LAN. No firewall, no router between.

I don't want to make a trunk on these link, because there is no need to carry more vlans.

Could you tell me why in such a case portfast could lead to errors ? Thanks

Configuring MST on our network... Yeah I can think about it....

Regards,

Olivier

CCIE #44658

Nathan Spitzer · ‎02-08-2011

First, setting a port as a trunking port is a way to tell IOS that it is connected to another layer-2 device even if there is only one VLAN. By setting a port as trunking you are enabling spanning-tree to do its job. Portfast, by contrast tells spanning-tree explicitly it is NOT connected to another layer-2 device so you turn off a bunch of checks. Essentially spanning-tree doesn't do a loop-check on a port-fast port before it come active, leading to the potential for a temporary spanning-tree loop until it gets everything straight. In your environemt it could cause instability, particularly if an interface flaps every once in a while. This is from the Cisco documentation, pay special attention to the bold:

Edge Ports

The edge port concept is already well known to Cisco spanning tree users, as it basically corresponds to the PortFast feature. All ports directly connected to end stations cannot create bridging loops in the network. Therefore, the edge port directly transitions to the forwarding state, and skips the listening and learning stages. Neither edge ports or PortFast enabled ports generate topology changes when the link toggles. An edge port that receives a BPDU immediately loses edge port status and becomes a normal spanning tree port. At this point, there is a user-configured value and an operational value for the edge port state. The Cisco implementation maintains that the PortFast keyword be used for edge port configuration. This makes the transition to RSTP simpler.

For more info see: http://www.cisco.com/en/US/partner/tech/tk389/tk621/technologies_white_paper09186a0080094cfa.shtml#topic5

Second, be aware that by connecting a switch (and I'l bet he's got others behind the one you connect to) you do not control into your spanning-tree and even worse, directly into your hosting environment you are subjecting yourself to a greatly elevated risk further magnified by the fact it is a non-Cisco switch further agrivated by the fact you are not even running the same kind of spanning-tree protocols!

The possibilities for glitches, incompatabilities, weirdness, etc is very real.

In similar situations I have I require a L3 link (no VLAN) and either static routing or filtered prefixes. I have seen these kinds of things go very very bad so I would be very, very afaid.

Nathan Spitzer

Sr. Network Commmunications Analyst

Lockheed Martin

Nathan Spitzer · ‎02-08-2011

Just remembered there is a very important little notethat details mixing RPVST+ and MST:

As the MST region now replicates the IST BPDUs on every VLAN at the boundary, each PVST+ instance hears a BPDU from the IST root (this implies the root is located inside the MST region). It is recommended that the IST root have a higher priority than any other bridge in the network so that the IST root becomes the root for all of the different PVST+ instances, as shown in this diagram:

In other words it is critical that your customers IST be the root bridge!

Here is the full link: http://www.cisco.com/en/US/partner/tech/tk389/tk621/technologies_white_paper09186a0080094cfc.shtml#region_bound

Nathan

Olivier Jessel · ‎02-08-2011

I've checked my config, portfast is not enabled on these ports.

I don't activate it in global config.

Regarding MST bridge, I've read this document before, and there is no problem as our customer configured his MST root bridge with priority 0. ^^

Regarding trunk, I couldn't do it because SW1 annd SW2 ports are in Vlan 32, but SW3 port is in vlan 1... and both Vlan should share the same IP range.

We are progressively moving all servers from the customer's LAN into our Hosting Vlan.

I couldn't use an L3 connection.

Last step I have to check... Installing a Cisco switch on site and deploy QoS !

Thanks a lot for your attention Nathan !

Cheers,

Olivier

CCIE #44658