Re: Loop issue with PVST+

da-alb · ‎06-10-2025

Hi everyone,

I have this network diagram here and I can't figure out why when I connect the link between C101 and C109 I have lots of packet loss and issues on VL60.
Every port that connects the switches has every vlan (1,10,20,30,40,60,70,80,90,100,200,300,500) tagged. I'm using RPVST+ in every cisco and the C101 (the core switch is a stack) has 8K as priority.
The vlans are: 10,20,30,40,60,70,80,90,100.
The blue switches are Cisco Catalyst 1300 and the green switches are HPE OfficeConnect 1820.

Any idea? What am I missing?

M02@rt37 · ‎06-10-2025

Hello @da-alb

You are using RPVST+ on Cisco but what about HPE swirches ?

Thanks.

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

da-alb · ‎06-10-2025

Well the HPE are not using PVST+ but STP and every HPE is only connected to a Cisco going back to the core

M02@rt37 · ‎06-10-2025

Ok @da-alb

On switches C101 and 109 please provide the output of that command:

show spanning-tree vlan 60

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

da-alb · ‎06-10-2025

C109 doesn't speak directly with C101 because the link has been disabled, you still want the output? Otherwise I can bring up the port and post the output. Thanks!

EDIT: fixed typo

Joseph W. Doherty · ‎06-10-2025

Just PVST not rapid-pvst, correct? (I understand, later IOSs now default to rapid.)

"lots of packet loss and issues on VL60", just V60? What kind of packet loss, i.e. reported by apps; seen on interfaces, and if seen on interfaces, which?

Why all the switch priority settings? I can see C101 as the primary, and C103 or C109 as the single designated secondary, but what do you see as the benefit for all the other explicit priority settings?

Also, no pruning, correct? All VLANs are needed on all switches? Why L2 vs. L3?

(NB: Last couple of questions, might not bear directly on your issue, but such L2 topologies are just a tad dated. I.e. usually you're better off using L3 or at least pruning VLANs. So, wondering, if such a L2 topology is really necessary.)

da-alb · ‎06-11-2025

Hi,
I'm using rapid-pvst. By packet loss I mean that if I ping some switches on VL60 or the default gateway, I see packet loss. I set the priority on the other switches just in case, it's useless?
What do you mean by pruning? Removing unnecessary vlans?
We use L2 because we wanted to separate each vlan from each other and use our firewall to restrict traffic.
How can I implement L3 with that? I inherited the ring and the customer has a fiber ring connecting the whole network for redundancy.

Joseph W. Doherty · ‎06-11-2025

I'm using rapid-pvst.

Good.

By packet loss I mean that if I ping some switches on VL60 or the default gateway, I see packet loss.

Pinging network devices, can very much be hit or miss, as Cisco network devices deprioritized and even ignore ping requests when busy. Usually, more important is recording ping performance through a switch, although any host can also give poor treatment to pings. (Don't misunderstand that the kinds of pings you were doing are useless, it's just their results are not always truly representative.)

I set the priority on the other switches just in case, it's useless?

Depends on what benefit you sought to accomplish. Which is why I asked that.

What do you mean by pruning? Removing unnecessary vlans?

Removing and blocking (on trunks) from LAN segments where there's no need. Avoids sending VLAN traffic, to LAN segments where there's nothing to receive it.

We use L2 because we wanted to separate each vlan from each other and use our firewall to restrict traffic.

That's unclear. You're using a FW as a L2 bridge?

How can I implement L3 with that?

Generally something like:

host 192.168.1.10/24 <v10> L2 SW <trunk> L2 SW <v10> host 192.168.1.11/24

becomes:

host 192.168.1.10/24 <v10> L3 SW <routed p2p /30> L3 SW <same or different V#> host 192.168.2.11/24

I.e. hosts that were within the same L2 domain, in the same subnet, are now in different L2 domains, in different subnets.

I inherited the ring and the customer has a fiber ring connecting the whole network for redundancy.

Redundancy - good, but you can use L3 in ring topologies too, and it's usually better; often much better. The only time you should use L2, is when you cannot use L3; which is often not the case. Decades ago, we had large L2 topologies, because we only had routers and hub, but with the advent of L3 switches, in many cases, you're better off using a L3 topology, and I believe the Catalyst 1300s are L3 switches.

Assuming you're pinging after STP converges after activation of the C101<>C109 link, activation of that link does change the logical topology.

For example, without that link, for C105 and C107 to exchange data, they would do so across the C105<>C107 link. But when the C101<>C109 link is activated, or C105 and C107 to exchange data, they would like do so like C105<>C104<>C103<>C102<>C101<>C109<>C106<>C107. If, they (C105 and C107) were exchanging lots of data, they may very well impact all the transit switches, and those transit switches traffic would also not impact C105 and C107.

BTW, if the ring of switches were using routing, C105 and C107, would likely be used, while still having the redundancy of the ring. (I have mentioned, L3 is often better.)

Without L3 or pruning, it's possible, while C105 and C107 want to exchanged data, C104 and C105 are also exchanging data, but under some circumstances, send copies of (flood) that data to C107.

Insufficient information to say for certainly, why activation of the C101<>C109 link causes the ping issues you described, and nothing revealed so far, screams "it's me!", but large and/or ring L2 topologies, are somewhat infamous, for having issues. (Larger ring topologies and STP, often aren't a good mix either [why there's protocols like REP].)

Lastly, FWs can easily become performance bottlenecks.

I suggest:

Confirming you ping issue extends to just pinging network switch IPs, as such results aren't a reliable indicator.

Prune VLAN where not needed on trunks.

Consider migration to a L3 topology, which, on L3 switches, can be done concurrently while maintaining L2, and you're not required to convert all L2, as again, you can have both on L3 switches.

Note: L3 topology will very likely require host IP addressing changes, which is pretty trivial if using DHCP.

da-alb · ‎06-12-2025

@Joseph W. Doherty wrote:
I'm using rapid-pvst.
Good.
By packet loss I mean that if I ping some switches on VL60 or the default gateway, I see packet loss.
Pinging network devices, can very much be hit or miss, as Cisco network devices deprioritized and even ignore ping requests when busy. Usually, more important is recording ping performance through a switch, although any host can also give poor treatment to pings. (Don't misunderstand that the kinds of pings you were doing are useless, it's just their results are not always truly representative.)
I agree about ICMP but I don't think this is the case because when I removed the link between C101 and C109, every ICMP that had issues started to work flawlessly.
I set the priority on the other switches just in case, it's useless?
Depends on what benefit you sought to accomplish. Which is why I asked that.
I thought It would be better to have it set than not.
What do you mean by pruning? Removing unnecessary vlans?
Removing and blocking (on trunks) from LAN segments where there's no need. Avoids sending VLAN traffic, to LAN segments where there's nothing to receive it.
I need vlan 60 on multiple switches because it's the management subnet where we keep the Ubiquiti APs, Idracs, San, Esxi.
We use L2 because we wanted to separate each vlan from each other and use our firewall to restrict traffic.
That's unclear. You're using a FW as a L2 bridge?
No, the firewall is the default gateway on each vlan.
How can I implement L3 with that?
Generally something like:
host 192.168.1.10/24 <v10> L2 SW <trunk> L2 SW <v10> host 192.168.1.11/24
becomes:
host 192.168.1.10/24 <v10> L3 SW <routed p2p /30> L3 SW <same or different V#> host 192.168.2.11/24
I.e. hosts that were within the same L2 domain, in the same subnet, are now in different L2 domains, in different subnets.
That would work if I had a clear separation i'd say. I have the need to have the subnet of the Vlan 60 in many switches, how would I implement that? Or I wouldn't at all?
I inherited the ring and the customer has a fiber ring connecting the whole network for redundancy.

Redundancy - good, but you can use L3 in ring topologies too, and it's usually better; often much better. The only time you should use L2, is when you cannot use L3; which is often not the case. Decades ago, we had large L2 topologies, because we only had routers and hub, but with the advent of L3 switches, in many cases, you're better off using a L3 topology, and I believe the Catalyst 1300s are L3 switches.
I confirm that 1300s can do some L3.
Assuming you're pinging after STP converges after activation of the C101<>C109 link, activation of that link does change the logical topology.
Yes, the issue arises when I enable the C101<>C109 link and I physically move an AP from a switch down the chain to another or when I reboot one on the same switch.
For example, without that link, for C105 and C107 to exchange data, they would do so across the C105<>C107 link. But when the C101<>C109 link is activated, or C105 and C107 to exchange data, they would like do so like C105<>C104<>C103<>C102<>C101<>C109<>C106<>C107. If, they (C105 and C107) were exchanging lots of data, they may very well impact all the transit switches, and those transit switches traffic would also not impact C105 and C107.
The is no C102.

BTW, if the ring of switches were using routing, C105 and C107, would likely be used, while still having the redundancy of the ring. (I have mentioned, L3 is often better.)
Without L3 or pruning, it's possible, while C105 and C107 want to exchanged data, C104 and C105 are also exchanging data, but under some circumstances, send copies of (flood) that data to C107.
Insufficient information to say for certainly, why activation of the C101<>C109 link causes the ping issues you described, and nothing revealed so far, screams "it's me!", but large and/or ring L2 topologies, are somewhat infamous, for having issues. (Larger ring topologies and STP, often aren't a good mix either [why there's protocols like REP].)
7 switches considering the root is considered large with rapid-pvst?
Lastly, FWs can easily become performance bottlenecks.
Firewalls are not an issue because the can manage more than 10 Gbit of traffic.
I suggest:
Confirming you ping issue extends to just pinging network switch IPs, as such results aren't a reliable indicator.
Not only, DNS queries fail, issues with DHCP too.
Prune VLAN where not needed on trunks.
I will try.
Consider migration to a L3 topology, which, on L3 switches, can be done concurrently while maintaining L2, and you're not required to convert all L2, as again, you can have both on L3 switches.
I will try, not easy though...
Note: L3 topology will very likely require host IP addressing changes, which is pretty trivial if using DHCP.

Thanks, I have answered in bold!

Joseph W. Doherty · ‎06-12-2025

I agree about ICMP but I don't think this is the case because when I removed the link between C101 and C109, every ICMP that had issues started to work flawlessly.

Yup, I only mentioned it as not all are aware of device response can be very variable, i.e. poor response isn't always just due to network.

I thought It would be better to have it set than not.

Well, if you were concerned what switch becomes root after first and second choices, and 3rd, and 4th and . . . ; )

Shouldn't cause any issues, I believe, but when you've lost both your primary and secondary choices for root switch, the problem is often big enough you don't worry about what switch inherits root next. Plus, as you've multiple switches with equal priorities after the first and second, tied priorities will be broken by MAC.

I need vlan 60 on multiple switches because it's the management subnet where we keep the Ubiquiti APs, Idracs, San, Esxi.

Why would management require all devices on same subnet? That's a bit unusual.

In any case, where ever you need the same VLAN, you can extend it across switches, to where it's needed.

Perhaps the issue is with L2 devices, you need to place a management IP on some subnet, and you don't want it one any "normal" data subnet. It does simplify security for L2 devices. (On L3 devices, a management IP can be a routed /32, access managed out of a larger address block.)

You can, though, push a management subnet, and its L2, everywhere, but that doesn't mean you need do so for all other VLANs. Also, you could multiple management subnets.

What is an issue for you, likely at least the ring switches will need to have all VLANs on them, for redundancy. Possibly a different story for branches.

The is no C102.

My bad, plus I new see another error, so:

C105<>C104<>C103<>C102<>C101<>C109<>C106<>C107

should have been:

C105<>C104<>C103<>C101<>C109<>C108<>C107

7 switches considering the root is considered large with rapid-pvst?

I wouldn't expect it to be an issue for rapid.

I'm still focused a change in topology, somehow, creating you issue.

BTW, don't know how much you can experiment, but it might be interesting to see how things perform after you add C101<>C109 you break then C101<>C103 or C105<>C107.

Firewalls are not an issue because the can manage more than 10 Gbit of traffic.

Hopefully, not to be confused with FW having 10g interfaces.

I will try, not easy though...

No, it's often not.

Basically, for greenfield deployments, often such a L2 topology wouldn't be used. But, for brownfield updates, yea, "if it ain't broke, don't fix it", often goes.

premkumark27 · ‎06-11-2025

since HPE only supports 802.1w (RSTP), convert Cisco from RPVST+ to RSTP or MST (Multiple Spanning Tree):

spanning-tree mode rapid-pvst ! (current mode)

Change to:

spanning-tree mode mst

Then configure MST:

spanning-tree mst configuration

name CORE

revision 1

instance 1 vlan 10,20,30,40,60,70,80,90,100

!

spanning-tree mst 1 priority 8192 ! Ensure C101 is root

This makes Cisco speak standard STP that HPE understands.

pieterh · ‎06-11-2025

your PDU packets may not be processed in time because your diameter is too large?
there is a max delay for every path between ANY two switches,
look at this path : c117-c109-c108-c107-c105-c104-c103-c101-c112-c113-c125
the original STP describes max 7 hops. (but it is a timing issue so it may work with faster processing switches)
maybe the use of vlan60 actually spans more of your switches than the other vlans?

M02@rt37 · ‎06-11-2025

@da-alb

To go further with diameter (Thanks @pieterh), see here

https://community.cisco.com/t5/switching/stp-diameter/td-p/3910425

Best regards
.ı|ı.ı|ı. If This Helps, Please Rate .ı|ı.ı|ı.

da-alb · ‎06-12-2025

This applies to rapid-pvst?

Joseph W. Doherty · ‎06-11-2025

BTW, as OP has described rapid PVST is being used, the infamous diameter 7 consideration shouldn't apply. That's not to say diameter isn't a possible issue with rapid PVST, it's just not usually an issue as quickly. I recall reading rapid variants usually are trouble free up to a diameter of 20, and might even make it up to 40.

Also BTW, STP is a L2 loop prevention protocol, for accidental or intentional loops. It's not needed, otherwise.

This means, the HPs don't really, really need to be in the same STP domain as the Catalysts (assuming current or similar topology), but it's still an excellent goal.