10-13-2005 05:08 AM - edited 03-03-2019 12:23 AM
Im seeing a lot of unusual ST topology changes, coming from a port which is supposed to be in blocking state: Cat4006 Gi1/2 (see thumbsketch)
+------------------ PortChannel ------------------+
Cat6513A --- (Gi1/1) Cat4006 (Gi1/2) --- Cat6513S
forw -------- forw ----------- blocking ------ forw
Cat6513A = Root, BridgeID = hex6000
Cat6513S = Standby, BridgeID = hex7000
Cat4006 = Access-Switch, BridgeID = hexC000 (UplinkFast)
I enabled debugging STP on the Cat4006 and saw interface Gi1/2 sporadically changing into listening state and changing back to blocking an instant later:
...
Oct 13 13:52:10: STP: VLAN0998 Gi1/2 -> listening
Oct 13 13:52:10: set portid: VLAN0999 Gi1/2: new port id 8002
Oct 13 13:52:10: STP: VLAN0999 Gi1/2 -> listening
Oct 13 13:52:11: STP: VLAN0310 Gi1/2 -> blocking
...
Because of this I assumed that Cat6513S failed in sending BPDUs to Cat4006 and stated tracing its (back)uplink port to Cat4006.
And as a matter of fact there are interruptions - exactly when the TCs occur.
Now my question: The time period of not sending BPDUs is short, only 4 - 8 seconds, which means 2 - 4 BPDUs are not received by Cat4006 Gi1/2.
Isnt that much to quick to change into listening state? I thought it takes 20 seconds (max age) to make the port change from blocking into listening? Is the behaviour of the Cat4006 normal (which means the problem is with Cat6513S)?
I tried UDLC and LoopGuard as well, but this features dont work in such short time periods .
Thanks in advance
Rolf Fischer
10-13-2005 05:57 AM
It should take at least 20 seconds to change port states from blocking to listening another 15 seconds from listening to learning and finally 15 seconds from learning to forwarding.
I understand from your detailed post that you have loopguard and udlc enabled. Is there by any chance a high CPU utilization on Cat6513S connecting to Cat4006. Somehow the port fails to receive BPDUs and triggers a TC.
Perhaps, if you're running CatOS on the 6500s you can utilize BPDU Skew Detection to see if you're CPU and link utilization is running high.
10-13-2005 10:01 PM
Thanks for your responses.
Here some additions:
All switches (Catalyst 6513 and Catalyst 4006) are running IOS (current 12.1), standard STP 802.1d/PVSTP (no RSTP) configured, checked and re-checked.
I tried udld and loopguard successively but they were unable to detect 1-3 missing bpdus (2-6 seconds).
The crucial question is: Why does the Catalyst 4006 change from blocking into listening if only 1-3 bpdus fail to appear? This seems to be very unusual. I don't think there's a problem with the 6513 because there are links to about 30 switches but the TCs occur only on this particular one. If it failed sending bpdus for 10 seconds or so, I'd assume the problem here but we're talking about 2-6 seconds (most times only 2s, 6s was the longest period up to now)
And I currently don't have a loop problem because the 4006 changes back to blocking immediately after it's changing to listening state.
10-13-2005 09:56 AM
These are recommendations to protect the networks against forwarding loops.
SPANTREE BEST PRACTICES
1. Make sure you have the complete toplogy diagram of the entire network with all the switches that carry the vlans.
2. Make sure the ROOT for all these vlans in the network is on the 6500 switch at the core or distribution whichever is the highest level for your layer 2 network.
3. If you have etherchannels configured between switches anywhere , make sure the MODE is set to DESIRABLE-DESIRABLE on both sides for PagP negotiation.
4. Same for trunks make sure they are set to "desriable" on Both sides and native VLAN is matched.
5. Enable UDLD on all fiber links Bi-Directionally. This will help detect a uni-directional link that could a loop.
6. Enable PORTFAST on all edge ports like workstations, PCs, Aps, printers etc.. to eliminate unwanted TCNs 7. - enable loopguard on all non-designated ports, in particular on the access switches
This link explain in detail these recommendations:
http://cco/en/US/partner/tech/tk389/tk621/technologies_tech_note09186a0080136673.shtml#secure_loops
10-13-2005 10:00 AM
Troubleshooting:
When you get the error messages please do the following to identify if there is an STP loop:
sho cam aging vlan # - if it is 15, it means there is an STP loop.
show logging buffer 1023 - check for flapping ports.
show top - To see the top port talkers - evaluates port utilization over a 30 sec period of time.
Starting with the root for vlan # down to the switches where you saw the log messages, do the following command
sh spantree stats
What you are looking for is to find the switch or port that initiated the topology change notification.
Do this same steps in all switches - to check for last topoloty change initiator and to do the switch connected to that port until you get to the bottom.
10-14-2005 02:56 PM
Be careful that the aging time going down to 15 seconds is just an indication of a topology change in the network, not a bridging loop.
From the symptoms, I have the feeling that the port is suddently going back to the initial state, reset as if the link was flapping. From the initial listening state, it goes to blocking as soon as it receives a BPDU from the neighbor. I don't know what is the cause of this, but it not STP related in my opinion.
Regards,
Francois
10-14-2005 11:56 PM
Thank you Francois.
I agree with you, from the symptoms it looks like linkflapping. And actually we had some link problems in the past.
But: I traced the forwarding of bpdus on that link at both sides and they are transmitted as well as received.
And always when the TCs / changing from blocking into listening occur, I see some 2 or 3 bpdus missing (also on both sides). Until this happens, the bpdus are received in the normal 2 second interval.
This is somehow remarkably, isn't it?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide