Re: When to use STP or NOT!

rob-it · ‎11-14-2010

Hello,

I have a large L2 switched network comprising of one C4507-R switch with about 190 ports. There is no redundency in the network - that is no other main switches - we have only the one. We have 1 router connected leading to a DSL providers network. The majority of connections are fiber leading to about 80 locations that have one or more edge switches with STP activated. Today we are running the standard PSTP on the 4507 and the CPU is climbing due to all the trunked links and amount of VLAN's - STP is the main consumer of CPU and debug is off. There is talk of moving to MSTP. I think the answer is to move to regular STP and only use the loopguard and portfast configurations. And another idea is to turn it off all together. Does anyone have any ideas or comments to share please feel free. I apreciate your responces.

Robert

Jon Marshall · ‎11-14-2010

rob-it@live.com

Hello,

I have a large L2 switched network comprising of one C4507-R switch with about 190 ports. There is no redundency in the network - that is no other main switches - we have only the one. We have 1 router connected leading to a DSL providers network. The majority of connections are fiber leading to about 80 locations that have one or more edge switches with STP activated. Today we are running the standard PSTP on the 4507 and the CPU is climbing due to all the trunked links and amount of VLAN's - STP is the main consumer of CPU and debug is off. There is talk of moving to MSTP. I think the answer is to move to regular STP and only use the loopguard and portfast configurations. And another idea is to turn it off all together. Does anyone have any ideas or comments to share please feel free. I apreciate your responces.

Robert

The short answer to your question is to always use STP, it is a failsafe against L2 loops which you may not intentionally have, but all it takes is for someone to connect a hub incorrectly and your entire network could go down.

When you say PSTP did you mean RSTP ? There will be no gain by moving between STP and RSTP although you could well see some gain by moving to MSTP and grouping your vlans.

One other thing to look at is the spread of vlans. Obviously the 4500 will have all the vlans but are configuring the trunk links to the remote sites to only allow the necessary vlans across that link, assuming you don't need all vlans in all sites which i would have thought you wouldn't need. By using the "switchport trunk allowed vlan ...." command on each trunk you can limit the distance and BPDU's for each vlan to only the 4500 and the specific remote site which should help with STP processing.

Jon

Peter Paluch · ‎11-14-2010

Hi Jon,

Very thoughtful answer, as usual!

One other thing to look at is the spread of vlans. Obviously the 4500 will have all the vlans but are configuring the trunk links to the remote sites to only allow the necessary vlans across that link, assuming you don't need all vlans in all sites which i would have thought you wouldn't need. By using the "switchport trunk allowed vlan ...." command on each trunk you can limit the distance and BPDU's for each vlan to only the 4500 and the specific remote site which should help with STP processing.

An interesting idea, sure worth pursuing. What I personally think about the CPU load incurred by STP is that it consists of the amount of the state information maintained for each VLAN and each port active in that VLAN, the need to emit all the necessary BPDUs on appropriate ports each Hello seconds, and process all appropriately received BPDUs. Pruning unnecessary VLANs from trunks surely helps but I dare not say to what extent - that obviously depends on the implementation details in the IOS. It's worth researching.

Best regards,

Peter

Peter Paluch · ‎11-14-2010

Hello Robert,

I would personally not recommend stopping using the STP even if you do not currently have redundancy in your network. STP not only allows you to use the redundancy at some point in the future, but also prevents your network from inadvertent or malicious actions of anybody having an access to an Ethernet cable leading to your network.

I am not sure what you meant by saying that you suggest running the regular STP. On Cisco Catalyst switches, there is actually no support for running a single 802.1D STP or 802.1w RSTP instance for all VLANs. In other words, both STP and RSTP implemented on Catalysts are running in per-VLAN fashion and you cannot change that. Until you migrate to MSTP, you have no option but to run per-VLAN STP which is understandably very expensive on CPU. Of course, you have the option of selectively deactivating STP instances on a per-VLAN basis but that is something I would strongly oppose to.

The continuation of your idea to run the STP using portfast and loopguard mechanisms is strongly tied to the issues I've just described - you'll have to run PVST or Rapid-PVST which will already tax your CPU high enough, and turning on the loopguard and/or portfast will not lead to any reasonable decrease in CPU - there is no reason for that, on the contrary, loopguard is likely to increase the CPU load even further because that is an additional work to do.

My personal recommendation would also be to go with the MSTP. The MSTP may appear as being somewhat obscure but if carefully planned and deployed, it is a fine protocol that hugely decreases the required system resources. Of course, it is expected that all your switches support MSTP to get the best results.

Best regards,

Peter

glen.grant · ‎11-14-2010

I would agree with Peter , if you have trunk links to all these switches and you are allowing all vlans across the trunks this can increase the CPUquite a bit.. Go thru on all trunk links use the "switchport trunk allowed vlan xx" command to only allow what is needed on each access switch on the connecting trunk links. I think you would see a noticeable drop if you do this . How many vlans are on the 4500 ?

rob-it · ‎11-14-2010

Hey Thanks for replies!

To clarify some more, my switch is running PVST, I have limited as many trunk links with the "switchport trunk allowed vlan" command, using access ports whenever possible and have the following number of VLANSs

Number of existing VLANs : 157
Number of existing VTP VLANs : 60
Number of existing extended VLANs : 97

The CPU on the switch is "normally" 50% and we have a Supervisor IV and will be upgrading to a Supervisor V 10GB in a few days.I also want to add even more VLAN's and this is what is causing concern with the CPU already at a steady 50%.

There have also been discussions about moving some of the connections off to another switch, we have a 3750 with 12 ports fiber that could be used. I do not know what the answer is to have a stable network and lower the CPU unless a CPU value of 50% is considered normal?

Regards,

Robert

Peter Paluch · ‎11-15-2010

Hello Robert,

So if I understand you correctly, you have already pruned VLANs off the trunks as much as possible, and your CPU is still running at 50% or more, is that true? Actually, how did you determine that it is the STP that is causing your high CPU load - can you post the show proc cpu sorted 1m command output?

I somewhat doubt that adding additional switches and reducing the connection density on your single 4500 would decrease the CPU load significantly if it is indeed caused by the STP - unless you redesign your VLAN architecture and decrease the number of VLANs on your 4500 Catalyst.

Let me put it down very briefly: you have 157 VLANs on your 4500 and you are planning to have even more. It is in my opinion simply ridiculous to maintain a standalone STP instance for each and every VLAN, especially when you do not have any redundancy in your network and thus cannot benefit from any load balancing whatsoever. The MSTP would reduce your 157 STP instances to a single one for the beginning, and would allow you to grow in your VLAN count without putting additional stress on your CPU because of STP. For the MSTP, you do not need any changes to the topology, no additional switches, no hardware modifications - just a small maintenance window to activate the MSTP configuration after it has been deployed on all switches and verify that the network is running okay. If the results are not satisfactory then you can always revert back to (R)PVST and seek for other solutions.

Best regards,

Peter

rob-it · ‎11-15-2010

Peter,

I do check the CPU load with the same command that you requested. I have also had 2 external consultants come in to verify that it is STP causing the problem but I am always open to new suggestions. Here is the command "show proc cpu sorted 1m"

CPU utilization for five seconds: 41%/1%; one minute: 44%; five minutes: 45%
PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
82   280775328 94588069       2968 18.34% 18.47% 18.47%   0 Spanning Tree
40   158619256 310557070        510 12.15% 12.62% 12.56%   0 Cat4k Mgmt HiPri
41   109205832 49048702       2226 5.63% 7.42% 8.22%   0 Cat4k Mgmt LoPri
142    18686240 25344849        737 0.87% 1.08% 1.14%   0 SNMP ENGINE
140    15090024 70053987        215 0.55% 0.90% 0.93%   0 IP SNMP
78    12000684 35833198        334 0.79% 0.80% 0.82%   0 IP Input
34    11943328   1965192       6077 0.55% 0.70% 0.71%   0 IDB Work
13     5052396   7441366        678 0.31% 0.35% 0.34%   0 ARP Input
   7     3126596    288181      10849 0.00% 0.26% 0.20%   0 Check heaps
73     1635096   6015511        271 0.07% 0.20% 0.20%   0 CDP Protocol
141     3192868 24503820        130 0.15% 0.20% 0.20%   0 PDU DISPATCHER
85     1024420   2853830        358 0.07% 0.14% 0.15%   0 CEF: IPv4 proces
127      462936 29081160         15 0.15% 0.10% 0.10%   0 PM Callback
48      885048    767156       1153 0.07% 0.07% 0.07%   0 Compute load avg
62    11882736 88382409        134 0.00% 0.05% 0.07%   0 DTP Protocol
38      736136     47121      15622 0.00% 0.04% 0.00%   0 Per-minute Jobs
66       88252   1572494         56 0.07% 0.02% 0.00%   0 UDLD
115      138752    223439        620 0.00% 0.00% 0.00%   0 DHCPD Receive
94         828     38330         21 0.07% 0.00% 0.00%   0 TCP Timer
81       16904   1641403         10 0.07% 0.00% 0.00%   0 NTP
27       88708    386831        229 0.07% 0.00% 0.00%   0 HC Counter Timer

Output terminated by me....

So Peter you are recommending that we go ahead and use the MSTP. I am installing a new Supervisor in 3 days and would like to implement the MSTP then during the installation of the new Supervisor. FYI: I have all ProCurve switches for the edges' on all sites. I just finished a project to replace over 100 switches with gigabit and we went with Procurve 2510G-48's and 24's and have some 1810G-24's as well. How will the ProCurve's play together with the Cisco running the MSTP then?

Regards,

Robert

Peter Paluch · ‎11-15-2010

Hi Rob,

Yes, my personal action would be to go with MSTP. After all, the MSTP was designed exactly for your situation

The HP ProCurve 2510-48 support MSTP and cooperate with Cisco's MSTP nicely. I am myself running a small network where a Cat3560G is running as a distribution layer switch and a bunch of HP ProCurve 2510 and 2620/2650 switches are working as access layer switches, all running MSTP. I haven't had any problems with them so far. I suggest strongly updating the ProCurve switches to the newest firmware before deploying them - some older HP switches actually required that a VLAN exists before it can be mapped to an MSTP instance. Newer firmware does not have this limitation, fortunately (this allows you to pre-provision the mappings before you even create the VLANs - reconfiguring the MSTP region configuration during network operation can cause connectivity outages, therefore many experts strongly suggest pre-mapping the VLANs into instances beforehand, and then just adding the VLANs as needed without reconfiguring the MSTP region itself).

I don't know about ProCurve 1810G-24 - please check their datasheet if they support MSTP as well.

Best regards,

Peter