cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1083
Views
0
Helpful
5
Replies

6509 STP on new uplink causing CPU surge/ traffic forwarding impact

pcweber
Level 1
Level 1

Sometime in the past 4-5 months we have developed a scenario that spikes the CPU enough that the console is unresponsive and packet forwarding is impacted for 6-10 seconds, etc. We can recreate it at will. It is always when the first port in a port channel pair uplinks. When I watch the console and the spanning tree moves from blocking to forwarding the console goes unresponsive, CPU #317 as per Solarwinds goes to around 45%. Pings to any VLAN L3 IP address go to 500ms or higher/ time out. Also our Solarwinds starts alerting that hundreds of nodes and users are impacted as I assume the core is struggling to forward packets. When the event clears there are no new entries in the log file saying anything related. If I add the config line "spanning tree BPDUfilter enable" to the 6509 side interface and bring up the new link it does not have impact.  I assume there are some STP debugs to try but I wanted to see if there was any strong recommendations as this causes deep impact every time we reproduce it. 

5 Replies 5

we need to see topology 

i agree with @MHM Cisco World that we need to see the toplogy. I would also ask what debug level you are sending to the console. I prefer 'logging console critical' to eliminate a lot of messages. Every character to the serial console interrupts the CPU, and I have seen this makes devices unresponsive when too many messages were being displayed. I would also ask what you see after an event in 'show spanning-tree detail vlan 1' (from memory, so the arguments might no be in the right order). If it is spanning tree reconverging, there isn't much you can do about it other than to limit those sorts of events. Is it possible there is an access port downstream that does not have 'spanning-tree portfast' on it? I ask because portfast does two things. First, it forwards frames during listening and learning to make DHCP work. That is the most well known feature. The other one is that is does NOT send a TCN (toplogy change notification) when the port goes up or down. That means spanning tree does not have to reconverge.

pcweber
Level 1
Level 1

Sorry about the hand written drawing. I am away for a funeral and do not have Visio on my laptop. We have 2 6509's connected via VSS. We have 3 different campus buildings with 2 switch stacks is each floor closet. The date center has around 45 top of rack switch's uplinked the same as closets with a dual 1gb port channel links. The STP issue happens on any port channel pair when both links have gone down. An example of this is when we have storms a closet stack has a UPS but could drain and drop power to the switch stacks for that floor. When they come up the firt 1gb port to come up goes through spanning tree blocking, then forwarding. When it switches to forwarding something starts that spikes the core cpu, core CLI including serial console is stuck. It relives in 6-10 seconds. It does affect packet forwarding network wide as it is the core. I can replicate it at will by downing a pair of uplinks from the core side and when I bring the first port up for the port channel it does it every time when STP moves to forwarding. I have not tried any debug yet as every time I do this it causes impact, alerts, etc.  Again, if I add the config line Spanning-tree bpdufilter enable to the port on the 6509 I bring up there is no impact.


@pcweber wrote:

The STP issue happens on any port channel pair when both links have gone down.


I suspect this isn't what you want to hear, but the situation you describe above means there was a topology change event (TCN) so spanning tree goes to flooding as it is supposed to do until it re-converges.

Joseph W. Doherty
Hall of Fame
Hall of Fame

"It is always when the first port in a port channel pair uplinks."

First of a pair - i.e. port-channel not up with any other link?  I.e. logically the (port-channel) link is coming up from being down?