Basic STP design best practices

JOHN NETTLES · ‎01-05-2011

I'm currently working in a shop that has a lot of L2. Current networks are deployed with upstream routers and downstream switches deployed in pairs, almost all using "boxes" not "triangles - IOW, each downstream switch is homed to a single upstream router, with a link between each router and each switch. I'd like to move to a dual-homed environment where each downstream switch has a link to both upstream routers.

My co-workers are adamantly opposed to this - they say the box design is just fine and offers no advantages over a dual-homed "triangle" environment. I personally have historically found that the "box" design is much less stable than the dual-homed design, but I can't find any documentation to back this up, or refute it. I like the configurability and deterministic behavior of a dual-homed connection, and I think it provides a more orderly failure response, plus STP "boxes" have bit me before. Perhaps this is all anecdotal and they're correct, but if so why is every single Cisco "best practices" document I look at use "triangles", even tho they don't explain why?

We're moving to L3 as fast as possible, but L2 will still be around and so we have to deal with this. These switches are on the edge of a data center, with the "switches" in question connecting to a distribution-layer "router". This is a mixed-vendor environment, with F5, Foundry and Cisco gear, all running RSTP.

Can anyone shed any light on this? Is my preference for dual-homed uplinks just prejudice from a bad design in the past, or are there solid technical reasons for avoiding STP squares? Opinions welcome, documentation VERY welcome. Thanks in advance!

dbass · ‎01-05-2011

In a datacenter I would always do dual uplinks directly to the distro layer from the access switches. Why would you want to burden the link from 1 switch with the traffic from 2? Additionally, in the event one of the links goes down you are affecting 2 access switches instead of one, which means 2x the servers affected by a single link outage...doesn't make sense to me. By doing the box you are also increasing the size of your L2 broadcast domains to switches that don't necessarily need to run those VLANs (I always manually prune VLANs in the DC). The datacenter should be your safest environment possible, and every step (within reason) should be taken to minimize the risk of outages, or minimize the impact of outages.

I think that the "box" design is acceptable in an user access switch situation, where you have limited amounts of fiber or something, but I still generally stick to the "triangle" if at all possible.

Nathan Spitzer · ‎01-06-2011

This is a mixed-vendor environment, with F5, Foundry and Cisco gear, all running RSTP.

May god have mercy on your soul. Your a better man then me. I tried to layer-2 connect Nortel and Cisco once and brought down the campus (3000 people) for an hour and a half. Never again.

Perhaps this is all anecdotal and they're correct, but if so why is every single Cisco "best practices" document I look at use "triangles", even tho they don't explain why?

My number one design rule is to minimize spanning-tree diameter. Your network stability is inversly and exponentially proportanal to your spanning-tree diameter. This has been a hard-won lesson honed over multiple outages caused by spanning-tree. Here are a couple of reasons why you do triangles:

If you want to do VSS (Which TOTALLY rocks) you do dual-attached MEC's (multi-chassis EtherChannels) from access switches to core routers so doing the dual-attached now makes sense. Rumor has it VSS is also coming to the 4500's next year so keep that in mind.
By doing boxes you increase the chances of a full spanning-tree topology change when a primary link fails. Even with RSTP thats a 10-second outage. With the triangles there is much less chance of a full topology change.
Your adding dependencies between access layer switches and making physical path determination much harder during an outage.
As has already been said, why burden one access switch with traffic from another?
Partly italso violates what I call "Network aesthetics" . I almost can't answer it, it just LOOKS wrong and its not what a "good, right" network looks like.

You don't say how many switches, routers, or anything but I get the feeling this is a not-inconsequential operation and there are no great physical limitations to doing it. To do it "by the book" takes a few feet of fiber, a couple of SFP's and a few ports and a little time. I would ask your coworkers, are they SURE they are so much smarter then the Cisco CCIE's who write the Cisco documentation,design guides, and Best Practices that they are willing to risk network stability to save so little work and resources?

At the end of the day I personally ALWAYS assume I am not smarter then the Cisco designers and have found over the years doing it "by the book" is almost always the right way and when I try to get creative or cut corners it comes back and bites me.

My (over-priced) 2-cents.

Nathan Spitzer

Sr. Network Communications Analyst

Lockheed Martin