10-09-2022 05:28 AM
Sometime in the past 4-5 months we have developed a scenario that spikes the CPU enough that the console is unresponsive and packet forwarding is impacted for 6-10 seconds, etc. We can recreate it at will. It is always when the first port in a port channel pair uplinks. When I watch the console and the spanning tree moves from blocking to forwarding the console goes unresponsive, CPU #317 as per Solarwinds goes to around 45%. Pings to any VLAN L3 IP address go to 500ms or higher/ time out. Also our Solarwinds starts alerting that hundreds of nodes and users are impacted as I assume the core is struggling to forward packets. When the event clears there are no new entries in the log file saying anything related. If I add the config line "spanning tree BPDUfilter enable" to the 6509 side interface and bring up the new link it does not have impact. I assume there are some STP debugs to try but I wanted to see if there was any strong recommendations as this causes deep impact every time we reproduce it.
10-09-2022 05:35 AM
we need to see topology
10-09-2022 08:33 AM - edited 10-09-2022 08:33 AM
i agree with @MHM Cisco World that we need to see the toplogy. I would also ask what debug level you are sending to the console. I prefer 'logging console critical' to eliminate a lot of messages. Every character to the serial console interrupts the CPU, and I have seen this makes devices unresponsive when too many messages were being displayed. I would also ask what you see after an event in 'show spanning-tree detail vlan 1' (from memory, so the arguments might no be in the right order). If it is spanning tree reconverging, there isn't much you can do about it other than to limit those sorts of events. Is it possible there is an access port downstream that does not have 'spanning-tree portfast' on it? I ask because portfast does two things. First, it forwards frames during listening and learning to make DHCP work. That is the most well known feature. The other one is that is does NOT send a TCN (toplogy change notification) when the port goes up or down. That means spanning tree does not have to reconverge.
10-10-2022 06:46 AM - edited 10-10-2022 06:47 AM
Sorry about the hand written drawing. I am away for a funeral and do not have Visio on my laptop. We have 2 6509's connected via VSS. We have 3 different campus buildings with 2 switch stacks is each floor closet. The date center has around 45 top of rack switch's uplinked the same as closets with a dual 1gb port channel links. The STP issue happens on any port channel pair when both links have gone down. An example of this is when we have storms a closet stack has a UPS but could drain and drop power to the switch stacks for that floor. When they come up the firt 1gb port to come up goes through spanning tree blocking, then forwarding. When it switches to forwarding something starts that spikes the core cpu, core CLI including serial console is stuck. It relives in 6-10 seconds. It does affect packet forwarding network wide as it is the core. I can replicate it at will by downing a pair of uplinks from the core side and when I bring the first port up for the port channel it does it every time when STP moves to forwarding. I have not tried any debug yet as every time I do this it causes impact, alerts, etc. Again, if I add the config line Spanning-tree bpdufilter enable to the port on the 6509 I bring up there is no impact.
10-11-2022 06:29 AM
@pcweber wrote:
The STP issue happens on any port channel pair when both links have gone down.
I suspect this isn't what you want to hear, but the situation you describe above means there was a topology change event (TCN) so spanning tree goes to flooding as it is supposed to do until it re-converges.
10-10-2022 07:30 AM
"It is always when the first port in a port channel pair uplinks."
First of a pair - i.e. port-channel not up with any other link? I.e. logically the (port-channel) link is coming up from being down?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide