Solved: RSTP Issue resulting in network flooding

matthew.phillips2 · ‎09-12-2021

Hi All,

I have a question about the best way to setup RSTP on a typical Cisco Switch (Typically we use an SG350 managed switch). I Also have a question regarding how RSTP should normally work as we have had some issues with a product crashing the network when trying to use RSTP and would like to know if this is the fault of the product or just something that happens.

Firstly the issue: We have recently been installing a field controller which utilises a daisy chained ethernet method to connect multiple of these devices together in a ring and has an option to enable RSTP at the individual controllers. If this is setup correctly, the RSTP Works fine, however the problem comes when a controller is replaced or connected with the RSTP setting disabled (default setting). Both ports on the switch remain active and essentially flood the entire network (Across VLANS etc) before the entire network goes offline. My understanding from a brief discussion with an ICN Contractor is that these controllers should forward a "BPDU" packet which stops this from occurring (the 2nd port sees this packet and automatically disables itself) but currently the BPDU packet is being dropped by our controller when RSTP is disabled. Is this correct and should my controller always allow these BPDU packets to pass or is there a setting on the switch that can be used to avoid this?

secondly, what is the process to setup RSTP on a cisco switch, from what I can see, the RSTP setting on the switch is enabled by default and works out of the box. are there any settings that need to be modified to help prevent issues from affecting the rest of the network?

Typically we have these controllers setup on a separate VLAN (either per loop or logically depending on the job)

Thanks

Sergey Lisitsin · ‎09-14-2021

Unfortunately there is nothing that can be done on the switch side to mitigate that risk. The only thing I can propose is changing the topology, when you don't have the controllers in a loop. Although that will mean that if a controller closer to the switch fails, the upstream ones will be unavailable until it is replaced.

View solution in original post

Sergey Lisitsin · ‎09-13-2021

@matthew.phillips2 ,

When you connect multiple switches in complete ring fashion, the frames have the way to circulate in that ring forever, because they don't have TTL field at layer 2 header. So, in order to prevent that from happening, STP (or RSTP in this case) disables some link to logically break the loop. Now with the controller in the ring not passing the BPDU packets and not preventing other frames from passing. This way, the physical loop is still present, but STP doesn't detect it and can't eliminate it. So, the possible solutions are as follows:

1. Enable RSTP on the controller

2. Don't enable the RSTP on the controller, but stop dropping BPDU frames, pass them over

3. Don't connect the controller using 2 interfaces, connect it with a single interface, thus excluding it from participating in the actual loop.

matthew.phillips2 · ‎09-14-2021

Thanks for your response @Sergey Lisitsin.

I'm sorry a portion of that goes over my head as I'm not too familiar with the networking terminology (am in the process of learning) but please see my responses below.

1. Enable RSTP on the controller (This is currently the solution but it is not uncommon for these controllers to get replaced regularly over the years by service techs and my fear is that when a service tech replaces a controller, they will not think to set it up on the bench first because typically these controllers have always been plug and play so I can almost guarantee that this will happen and cause this network storm from occurring again)

2. Don't enable the RSTP on the controller, but stop dropping BPDU frames, pass them over (The problem with these controllers is that they are dropping the BPDU Packet when RSTP is disabled which is what is causing the issue when the controllers are connected in a ring fashion, they block the BPDU Packet and both ports open - Is this a common problem in the world of RSTP and just a natural risk that comes with it as I have raised this with product support and they have been less than helpful saying that "It is not an issue")

3. Don't connect the controller using 2 interfaces, connect it with a single interface, thus excluding it from participating in the actual loop. (I assume this means simply do not use RSTP? This is currently the only option I have which is a shame since if a single controllers dies / cable gets cut etc, we lose everything afterwards.

I guess the biggest question I am trying to understand is - is there a problem with the new controllers or is this just an unavoidable risk? Also, When these network storms occur as a result of a VAV not having RSTP enabled, the storm affects the entire site (not just the VLAN If the issue could be isolated and only take down the ring or the VLAN, we could engineer around it but currently it is taking out entire ICNs.

Hopefully this makes sense and feel free to ask for any clarification.

Sergey Lisitsin · ‎09-14-2021

OK, so as I understand it, enabling RSTP or just passing the BPDU frames aren't the options. Then the last one is connecting the controller so that it doesn't participate in a loop. Here is an example. This is how your topology looks now:

As you can see the controller is a node of the switching loop.

And this is how I propose to connect it:

Now it is not participating in a switching loop and doesn't need RSTP

matthew.phillips2 · ‎09-14-2021

@Sergey Lisitsin

Only problem with this is that the controllers only have 2 Ethernet ports so it could possibly be achieved via adding in an unmanaged switch or equivalent at the first controller, but Ideally I would like to fix either the product (if it has a problem) or setup my Cisco Switch so that in the event of a network storm, the issue can be contained without breaking the entire site.

I have created a rough LAN in paint (Sorry for the quality) to clarify my current setup. Do you have any suggestion on if there is an issue with the controllers or if there is any setting I can use to help isolate any network storm on the affected ports / switch?

VAV = Controller in question

how would most devices work in this scenario, would they experience similar problems to this or should I be pushing back to product support on the fact that this is a bug? I just need a scenario where I have the benefit of RSTP without the risk of Crashing the entire sites ICN due to a simple mistake that will be made eventually. Currently the Risk simply outweighs the rewards by a large margin.

Thanks for your input Sergey, I appreciate you trying to explain some options!

Sergey Lisitsin · ‎09-14-2021

@matthew.phillips2 ,

If you want to have the benefits of RSTP, then you need to enable RSTP

The problem in your scenario is that you have a physical loop. So, there are only two options: 1 - make RSTP do its job and disable some links logically. 2 - Break the physical loop and eliminate the need for RSTP. Unfortunately there is no third option.

There isn't enough info to advise more precisely. Do the controllers really have to be connected in a daisy chained fashion? Or can they be just connected with a single interface to the switch. The problem is - if they behave like switches (forwarding frames between ports), then they have to comply with the design rules that apply to the switches. Either running some flavour of STP protocol or not be a part of a physical loop.

matthew.phillips2 · ‎09-14-2021

@Sergey Lisitsin

They can be connected with a single interface but would involve a large amount of cabling and would use up most of the ports on the switch. (this can be used for critical environments but most jobs would not warrant this)

The thing that confuses me is that I would imagine these controllers would follow the guidelines as they are designed around being used in an RSTP Ring. I Just don't understand how a product can state it can be used in an RSTP ring but then nothing can be done (from the switch side or from the controller side) to stop it from bringing down an entire buildings network.

Sergey Lisitsin · ‎09-14-2021

Well, it can be used in a ring, but it then HAS to have the RSTP turned on. Just having a function doesn't guarantee anything. Only using that function will give you some result. So, to summarise once again - if there is a physical loop, you need RSTP. If you can't use RSTP, break the loop somehow. Unfortunately there is no other option.

matthew.phillips2 · ‎09-14-2021

@Sergey Lisitsin

That's fine, in a scenario where we plan on using a ring network with RSTP, we always intend to enable the RSTP setting but my concern is the repercussions from making a mistake (either forgetting to set the RSTP setting or for when a service tech swaps out a VAV not knowing how RSTP works and so on).

In the event where somebody makes a mistake and does this, is there any setting on the switch that can prevent the entire network being affected? Some method in which only the single switch goes offline or a single VLAN rather than affecting everything? For example I have had a situation in the past where 1 single controller out of 500 didn't have RSTP enabled and the resulting network storm took down the entire ICN for the building.

If this is just a risk that needs to be taken when using RSTP, then that's understandable but if the risk cannot be reduced to a single switch then we would just need to pick and choose where we should and shouldn't use it.

Hopefully this makes sense

Thanks

Sergey Lisitsin · ‎09-14-2021

Unfortunately there is nothing that can be done on the switch side to mitigate that risk. The only thing I can propose is changing the topology, when you don't have the controllers in a loop. Although that will mean that if a controller closer to the switch fails, the upstream ones will be unavailable until it is replaced.

matthew.phillips2 · ‎09-14-2021

@Sergey Lisitsin Thanks for the responses mate, I Definitely have a better understanding of it now and can at least warn some of my colleagues when it comes to installing these devices on a customer site. Its a shame the backlash from making a mistake is so high but for now I'll try and make sure everyone is aware.

Thanks for your help Sergey, Really appreciate it!

Sergey Lisitsin · ‎09-14-2021

@matthew.phillips2 ,

Always glad to help