increasing priority on STP root bridge (but it will still remain root)

Jeroen Huysmans · ‎12-08-2017

Hi,

During a maintenance window at a customer I upgraded the root priority on some core switches in different networks in order to make room to add additional new core-switches (N7K) in the near future which will replace the old core (cat6500/cat4500).

Most of their networks exist of 4 core devices (some only have 2), priorities were 4096 - 8192 - 12288 - 16384.

All edge switches have the default prio (32768). All switches are running rapid-pvst mode and are p2p connected. All edge ports have portfast (trunk) enabled etc to support rapid convergence.

I started by changing their least important bridge (highest prio) first and went down the path until I had to change the rootbridge as a last step. The final situation was 20480 - 24576 - 28672 - 32768

It was during this last step (increasing the prio on the root bridge after all other bridges had been increased) I noticed 'interruptions' (high cpu, HSRP changes, macflap)...

Everything restored in an acceptable timeframe (30 - 40 sec) but I was expecting +/- 1 sec convergence (and definitely not HSRP failover, 100% cpu spikes and macflaps).

After the maintenance window I took the time to investigate what might have happened. I was expecting that increasing the root-bridge prio from 4096 to 20480 might have caused a temporary situation where surrounding bridges would still have the old BPDU active (4096), claiming they've had a better root bridge in their topology.

I created a simple lab setup to verify this behavior:

When I've made this final change (increasing the root bridge prio from 4096 to 20480, which makes it still the rootbridge) I captured the STP debugging on all 3 switches...

The 3 bridges had their clock synchronized in order to be able to compare the timestamps related to the events.

I will paste the most noticeable output here, the complete output from the 3 bridges is attached.

core-up-root:

Dec 7 14:44:39.101: setting bridge id (which=1) prio 21980 prio cfg 20480 sysid 1500 (on) id 55DC.0027.0cbd.2500
Dec 7 14:44:39.118: RSTP(1500): Fa1/0/24 is now root port
Dec 7 14:44:39.126: RSTP(1500): we become the root bridge
Dec 7 14:44:39.134: RSTP(1500): Fa1/0/24 is now root port
Dec 7 14:44:39.940: RSTP(1500): we become the root bridge

tor:

Dec 7 14:44:39.108: RSTP(1500): updt roles, received superior bpdu on Gi0/1
Dec 7 14:44:39.108: RSTP(1500): Gi0/2 is now root port
Dec 7 14:44:39.108: RSTP(1500): Gi0/1 blocked by re-root
Dec 7 14:44:39.110: STP[1500]: Generating TC trap for port GigabitEthernet0/2
Dec 7 14:44:39.124: STP[1500]: Generating TC trap for port GigabitEthernet0/1
Dec 7 14:44:39.131: RSTP(1500): updt roles, received superior bpdu on Gi0/2
Dec 7 14:44:39.134: RSTP(1500): transmitting an agreement on Gi0/2 as a response to a proposal
Dec 7 14:44:39.136: RSTP(1500): Gi0/1 Dispute!
Dec 7 14:44:39.140: RSTP[1500]: Gi0/1 dispute resolved
Dec 7 14:44:39.160: RSTP(1500): updt roles, received superior bpdu on Gi0/2
Dec 7 14:44:39.163: RSTP(1500): transmitting an agreement on Gi0/2 as a response to a proposal
Dec 7 14:44:39.946: RSTP(1500): Gi0/1 is now root port
Dec 7 14:44:39.946: STP[1500]: Generating TC trap for port GigabitEthernet0/1

My conclusion (please do comment ;-)):

the tor switch keeps advertising for a while the root bridge has a prio of 4096... This causes some kind of "fight"(or dispute as seen in the logs) which causes a slight delay in the process of converging the network. I seem to notice in the logs from core-up-root there is no rootbridge for a while (msecs).

If I do the opposite: decreasing the prio from 20480 to 4096 the same behavior is not seen.

It makes sense if my above conclusion is correct: decreasing the prio will make sure the BPDU with prio 20480 (which may still exist for a while) is not preferred because there is another BPDU in which the rootbridge has prio 4096.

This makes increasing the root bridge prio a more impacting change when you compare it to decreasing the root bridge prio.

In this lab we are talking about converging within a second... This is a non-live network (no mac-addresses active in the vlans) only existing of 3 bridges and 2 vlans.

The production networks I altered have 4 core-bridges (either 6500 or 4500) and +/- 30 "tor" (top of rack) devices (3750x).

The HSRP failover I've seen must have confirmed an interruption of at least 3 seconds (the configured hold-time is 3 sec).

The 100% CPU I've seen makes me think a loop might have occurred.

The macflap logging (on the top of rack switches, like tor in the lab) confirms interruption/loop. The macflaps where mainly on the root-port and alternate port...

Attached you can find the full stp debug logging from the 3 devices (ignore the vlan1600 logging, this is a vlan I excluded from the STP process on the TOR switch because the client doe have such vlans which are excluded from STP)...

ps: I also verified the lab with root guard enabled on the uplinks from core to tor... This resulted in less logging but slower reconvergence (6 seconds, which is 3x hello... the aging out of the BPDU).

regards,

Jeroen