09-21-2017 01:22 PM - edited 03-08-2019 12:07 PM
Dear Experts,
I´m just start working with RSTP for a project in a topology that seem to have a few issues.
The topology (attached) can´t be modified because most of the links are radio based (900 MHz), which in fact add a couple of issues to the network.
As you can see there are many loops created in the topology. All switches are Siemens layer 2 industrial switches, but the three green ones that are Siemens layer 3 industrial switches and the yellow one that is a CISCO 3850 (WS-C3850-48T-S) switch.
Most of the switches has other terminal equipments connected to the such as PLC, UPS and HMI as part of local and remote control stations. The network is used for supervising and control all the remote stations and the data of all of them converges to the CISCO switch which connect to the control room (SCADA system).
Because the loops and the directives of the project, RSTP should be used, so the first to do is choose the root switch. Due to the capacity and topology function and ubication, I chose the 3850 as the root switch. Also not every remaining switch has been RSTP configured (only the ones with P-BID).
The P-BID of the RSTP switches has been selected as per the distance to the root switch, an here is my first question, about if is it possible to assing the same P-BID to different switches. As you can see starting to the lowest 4096 (or 0) for the root switch, I only can assign 15 different values until 61440.
Assuming the priority assignment is fine, I made some arrangement to the path cost of some switches in order to redirect the main of the traffic using the RAP-01 and RAP-04 switches, otherwise the default values would make RAP-04 switch to handle the traffic of the most of stations. The analisys yield to changes the path costs indicated in switches RAP-06, CP-01, RAP-02, RAP-03, P-298 and CD-C1, resulting in the blocking ports indicated with an "X" when RSTP converges. You can see in the diagram the resulting topology that converges when all links are stable. I have check this in the field by watching the ports states of RSTP switches indicated.
So far the network was working fine for some days in production, while I was using a network monitor to see the availability of the stations, but suddenly several stations lost ping response as I can see an increment of broadcast traffic in the 3850 switch. This situation have happened a few times after I "restart" the network by rebooting secondary switches (green) the most of times.
Even when the netowork is running smooth, I can see "flapping" at 3850 between ports connecting to RAP-01 and RAP-04 the most of time. Also I can see frequently "topology changes"registers at secondary switches when I check their logs. A couple of times I have expected a "hang out" of one of the secondary switches too.
At this point and as I can see from many literature reviewed, I think this problem could be caused because the lost of BPDUs, that should be produced because two main factors: diameter of the network and inestability of some of the radio links. Peaks of traffic should be a problem too, but I this is controlled the most of time.
As I read is could be possible to change the RSTP timers (that I understand it should be changed only at the root switch) in order to minimize this effect. So my main question is if there is a practical rule for tune timers (should I change all of them) or should I apply a trial-and-error tuning? Also I want to know about your experiences in this kind of mixed topology. How can I find the diamter of the network? Should I change other parameter in the remain switches?
I would like to know if I have reached the limit of performance of this network or there are some additional tuning that I can try.
Thank you in advance for your promptly advices and information.
Best regards.
09-21-2017 02:34 PM - edited 09-21-2017 02:35 PM
The first thing that comes to mind is Cisco use "per vlan" RSTP, while many other vendors use "per port" RTSP. Unless all the kit uses the same system you will get different kit calculating different spanning trees (well you will if their are multiple VLANs and they have restrictions on where they are allowed to go).
So I would start by finding out which method Siemens uses.
09-21-2017 02:48 PM
Thank you, Philip.
I set PVRST at CISCO 3850 and RSTP at all Siemens switches. Both switches are using VLAN1 by default.
Do you think this could be an issue?
Best regards,
09-21-2017 02:51 PM
09-21-2017 03:01 PM
Do you mean the loops in the topology attached? Yes, all is in VLAN1.
I have not created any VLANs in SIEMENS switches (VLAN1 is the default) and the PVRST at 3850 is set at VLAN1. There are other VLANs at 3850 but all PVRST configuration is at VLAN1.
CISCO litereature indicates that PVRST should be used in 3850 when RSTP is used in other switches that are part of the same network.
I can send you the configurations shots if you like for your check,
Regards,
10-17-2017 12:06 PM
Dear all,
Regarding to the case exposed, I have changed RSTP timers to the maximum values, and I could experiment a better stabillity of the network: flapping has been reduced but still present and crashes are gone so far.
I´m still monitoring the network using PRTG to obtain %99.95 of availability for our client and the system still detects downtimes for some equipments, most of them are switches, but it includes sometimes radios too. The downtimes last from minutes to one or two hours sometimes, and recovers itself.
I´m pretty sure that the equipments that reports downtime are not really "unavailable" becuase other equipments depending of them does not show down, (i.e. equipements connected to a "down" switch don´t show downtime. If the swith really goes down, all the connected equipments should go down too).
Could any of you help me to get a correct or more real monitoring of these equipments? Should I check their configuration or change a particular one?
Thank you again for your help,
Best regards.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide