How can a native VLAN mismatch cause a L2 loop?

a1111 · ‎02-01-2025

Hello,

Can someone please help me understand how a native VLAN mismatch can cause a L2 loop?

I've found some sources online but it's not clear to me after reading the answers:

1) The link that the comment refers to is broken.

"A native VLAN mismatch will cause problems with STP. Specifically, if there is a native VLAN mismatch, the STP state of one end of the link becomes broken while the other end of the link is functioning normally. This will result in an STP loop. Take a look at this Cisco Learning Network discussion for more details."
https://forum.networklessons.com/t/802-1q-native-vlan-on-cisco-ios-switch/1082/55

2) The topology in Google drive isn't available anymore, which might be one reason why I don't understand the explanation:

"This is a Spanning tree issue. The basic issue is that if you have a native VLAN mismatch, the STP-BPDUs (Spanning Tree - uses BPDUs to prevent loops) for VLAN 1 will be sent without a VLAN tag. The STP traffic coming from VLAN 2 will be treated as if it is untagged (without vlan info in the 802.1 portion of the header).

The simplest way to look at this is that if the native VLANs on opposite ends of a trunk do not match, the Spanning Tree Protocol cannot prevent looping, because it treats communication over this trunk as untagged or not associated with a VLAN."

https://community.cisco.com/t5/switching/native-vlan-dismatch-looping-situation/td-p/2917643

THanks

MHM Cisco World · ‎02-01-2025

https://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/24063-pvid-inconsistency-24063.html

Check this'

MHM

a1111 · ‎02-01-2025

Hello,

Thank you very much. I've read the article, and while it's got a good point, I don't think it has proven that it's critical for the native VLANs to match. Let me explain, I'd be interested to read your -- and anyone else's -- thoughts on this:

I needed a small refresher on the terms used. Here are my notes for anyone else who also needs one:

Basically, the IEEE did something, and cisco tried to improve on it.

The IEEE invented STP (802.1D), but it could only have a single STP instance. In response, cisco invented a flavor which could have multiple STP instances (PVST+).

Then, the IEEE invented a faster version of STP (802.1w) -- ie it had faster convergence time. In response, cisco invented a flavor which was also faster than it's previous one (RPVST+).

Later, the tables have turned, and IEEE did something that was inspired by cisco. Namely, the IEEE invented its own flavor of STP that could have multiple STP instances. This is MSTP (802.1s).

Back to my question:

There are two pieces of the puzzle. The first one is this:

"

VLAN 1 STP BPDUs are sent to the IEEE STP MAC address (0180.c200.0000), untagged.

VLAN 1 STP BPDUs are also sent to the PVST+ MAC address, untagged.

Non-VLAN 1 STP BPDUs are sent to the PVST+ MAC address (also called the Shared Spanning Tree Protocol (SSTP) MAC address, 0100.0ccc.cccd), tagged with a corresponding IEEE 802.1Q VLAN tag.

If the Native VLAN on an IEEE 802.1Q trunk is not VLAN 1:

VLAN 1 STP BPDUs are sent to the PVST+ MAC address, tagged with a corresponding IEEE 802.1Q VLAN tag.

VLAN 1 STP BPDUs are also sent to the IEEE STP MAC address on the Native VLAN of the IEEE 802.1Q trunk, untagged.

Non-VLAN 1 STP BPDUs are sent to the PVST+ MAC address, tagged with a corresponding IEEE 802.1Q VLAN tag.

"

I think the exact same thing can be said the following way:

With the IEEE flavor, everything is untagged (ie the native and non-native VLAN traffic as well). Clearly, this must be so: the IEEE flavor only knows about a single STP instance, while native vs non-native imply multiple instances. But by definition, "one" is not "more than one," so there's no point in talking about native and non-native VLANs with the IEEE flavor.

But, with cisco's flavor, the native is untagged, while the non-native is always tagged -- again, by definition.

Building on this, the diagram in the article gives a reason why the native VLANs must match. And that's the second piece of the puzzle:

Switch A sends a BPDU to Switch B, and Switch B forwards that BPDU to Switch C.

Switch C would flood Switch B's frame out all other ports that belong to the same VLAN as the ingress port (ie the one that connects Switch C to B). As a result, Switch A would also get the frame.

Which means that the BPDU that originated from Switch A would be received by Switch A.

Next, Switch A would not flood that BPDU to Switch B or Switch C. It wouldn't do that to Switch C, since flooding never happens towards the receiving port. And it wouldn't flood it to Switch B, since Switch B belongs to a different VLAN, while flooding happens to ports within the same VLAN.

But still, the problem remains: the BPDU would make a single loop.

But, I don't see why that would cause major issues. Sure, it would waste some bandwidth, but this by itself isn't critical. The traffic would traverse the LAN.

Also, it looks like in a cisco-only LAN, the native VLAN would not need to match. That's because all cisco switches default to PVST+, and this issue can only happen if there's at least one switch that can understand only CST (Common Spanning Tree, ie a single STP instance in total).

What do you think?

Giuseppe Larosa · ‎02-01-2025

Hello @a1111 ,

a native VLAN mismatch can create serious issues in a classic LAN campus when:

Rapid PVST in use

MAC address reduction is is in use

= BRidge priority = N*4096 + VLANID#

Bridge ID = BRidge Priority + single MAC address

possible values for N are 0, 1 and so on the priority field is 2 bytes long. 4096 for the 802.1Q standard is the max number of VLANs with vlan 0, vlan 4095 reserved.

other vlans are reserved in Cisco switches.

if two access layer ports in different VLANs with spanning-post fast are enabled without BPDU guard and some STP security measures are not configured correcty

the two VLANs become a single broadcast domain

if VLAN x is less then VLAN y. VLAN x root bridge becomes ALSO Vlan y root bridge

example 100 < 200

on inter switch links, PVST and Rapid PVST use proprietary BDPU format and external 802.1 tag is compared with internal Bridge ID priority as a consistency check

in our example in inter switch trunk link this consistency check fails because STP BPDUs with external tag 802.1Q 200 are received with internal Bridge ID=100.

As a result of this the inter switch port can be moved to an inconsistent state for vlan 200 and vlan 100.

This can be temporary or permanent depending on overall configuration.

In my real world case dated year 2013 the root cause was

spanning-tree loopguard default at global level configured in the distribution level switches.

The campus was made of more then 100 switches with 3 levels hierarchy , there were 3 distribution layer stacks and 2 core switches.

Modern improvements exist to Rapid PVST like bridge assurance that is enabled by default in Nexus switches.

with bridge assurance the sending of BPDUs happens on each inter switch regardless of role (designated port, root port, alternate root port and so on) every 2 seconds, making it similar to RIP hello packets in IP world.

With stacks, VSS, now VSLs the span of STP in a campus network has been reduced.

With SD Access, using DNAC (now called Catalyst center) controller and Catalyst 9x00 devices an IP fabric is built and L2 and L3 services are provided over this IP fabric with data planes using VXLAN.

However, SD Access is suitable only for big enterprises and it is a work in progress and it is very difficult to sell and troubleshoot (in Italian market in my personal experience)

Hope to help

Giuseppe

Joseph W. Doherty · ‎02-01-2025

"Can someone please help me understand how a native VLAN mismatch can cause a L2 loop?"

It's just not misconfigured trunk native VLANs that can allow a L2 loop.

Assume we have two hubs or (non VLAN capable) switches side-by-side.

We interconnect port 1 to port 1. All good, right?

Next, for redundancy, we additionally interconnect port 2 to port 2. Not good, right?

To allow the redundancy, STP was created. Now we can have both ports connected but only one will be actively used, right?

Next, let's replace whatever we used above with two VLAN capable switches.

If both switches have all their ports defined to use the same VLAN, no different from above, right?

But, what if both switches place their port 1 into VLAN 1 and their port 2 into VLAN 2?

Do we have any issues? We may.

For the just described we don't need spanning tree, correct? (Hmm, or do we? What about redundancy?) But what happens if we use the original STP, which is VLAN blind? It will block one of the two ports, partitioning one of VLANs!

To take advantage of using VLANs, Cisco created a proprietary variant of STP, one instance STP per VLAN, PVST. But for this variant to work correctly, all VLANs must agree between switches, and VLAN tagged frames, between switches insures that. Non VLAN tagged frames between switches does not insure that. Cisco access port and trunk native VLANs port frames are both VLAN untagged.

For the dual VLAN capable switches, what happens if we swap the port 1 and 2 connections, on just one switch?

a1111 · ‎02-01-2025

Hi,

Thank you very much for the response.

"But for this variant to work correctly, all VLANs must agree between switches"

But why?

"For the dual VLAN capable switches, what happens if we swap the port 1 and 2 connections, on just one switch?"

So if we swap the VLANs that those ports belong to?

Then the VLANs would not match.

Before the swap:
Switch X's port 1 is in VLAN 1, and port 2 in VLAN 2. Same for Switch Y:
Switch Y's port 1 is in VLAN 1, and port 2 in VLAN 2.

After the swap:
Switch X's ports haven't changed. Port 1 is still in VLAN 1, and port 2 in VLAN 2.
Switch Y's port 1 is in VLAN 2, and port 2 in VLAN 1.

(I've given the non-standard example switch names of X and Y because in another comment, I already refer to A, B, and C. So this is to avoid any confusion.)

However, these are access ports, and not trunked interfaces. So there are not dot1q tags. Would STP cause any issues?

According to this forum entry, the answer is no:
https://community.cisco.com/t5/switching/access-port-vlan-mismatch/td-p/2066417

"STP is not an issue on access ports because the IEEE standard version of BPDUs is used on access ports and the standard version has no embedded info about the vlan for which the STP instance is running."

CDP and DTP could cause problems, though.

Is that what you had in mind?

Joseph W. Doherty · ‎02-01-2025

@a1111 wrote:

"But for this variant to work correctly, all VLANs must agree between switches"

But why?

Because we're trying to keep VLANs separate, i.e. truly different L2 broadcast domains - the whole point of STP, and PVST is trying to provide STP for each VLAN. If we mix VLANs frames, without tags, we can no longer keep the VLANs distinct, i.e. you've merged two VLANs into one, but PVST isn't expecting one VLAN, it's expecting different VLANs.

@a1111 wrote:

"For the dual VLAN capable switches, what happens if we swap the port 1 and 2 connections, on just one switch?"

So if we swap the VLANs that those ports belong to?

Then the VLANs would not match.

Correct. But except for something like CDP, switch doesn't "know" there's a mismatch.

@a1111 wrote:

Before the swap:
Switch X's port 1 is in VLAN 1, and port 2 in VLAN 2. Same for Switch Y:
Switch Y's port 1 is in VLAN 1, and port 2 in VLAN 2.

After the swap:
Switch X's ports haven't changed. Port 1 is still in VLAN 1, and port 2 in VLAN 2.
Switch Y's port 1 is in VLAN 2, and port 2 in VLAN 1.

No, I meant we just swapped the cable connections, not that we swapped the port configurations, so, switch Y port 1 is still in VLAN 1 and its port 2 still in VLAN 2. I.e. We've merged the two VLANs, and have physically looped the two VLANs now single merged VLAN.

@a1111 wrote:

However, these are access ports, and not trunked interfaces. So there are not dot1q tags. Would STP cause any issues?

Correct, they are access ports, but they, BTW, could have .1q tags, but with a zero VLAN ID, or no .1q tag at all. Ditto for trunk native VLAN tags. I believe that a trunk native VLAN will also accept a .1q tag with the native VLAN ID too, access ports might also, but not sure about those.

If you cross connect the two VLANs, you've effectively created a merge VLAN, and as in my example above, the merged VLAN has a L2 loop the wasn't present before the two VLANs became (effectively) one VLAN.

Now, is that a problem for STP? Not as long as STP "sees" all the BPDUs for the merged VLANs as it would for a single VLAN and it can logically break the loop.

So back to your original question, "Can someone please help me understand how a native VLAN mismatch can cause a L2 loop?", it allows "unintentional" VLAN merging, same as if using access ports.

Will it cause STP problems? Unlike the reference you provided, possibly not. Although there are many ways it could, for instance, if the unintentional merger creates a too large STP domain.

a1111 · ‎02-05-2025

Thank you again for the thorough response.

I've given some further thought to this question. What do you think about this explanation?:

Let's rephrase it by using switch names according to what VLAN they think is the native VLAN. So SW1 thinks VLAN 1 is native, and SW2 thinks VLAN 2 is native.

SW1 gets a BPDU in an untagged dot1q frame from SW2. Next, SW1 does the STP calculation for VLAN 1. But, SW1 also gets a tagged frame from SW2, with the VLAN tag of VLAN 1.

As a result, SW1 gets two different BPDUs for VLAN 1.

And then the STP calculations go on infinitely.

Joseph W. Doherty · ‎02-05-2025

No, you don't want to think of the native VLAN that way, as the same switch could have many different native VLANs. It's each trunk port specific.

As to STP going on infinitely, don't think that could happen, but what might happen, is a STP instance not getting all the BPDUs it should, or more than it should. For example, on a pair of switches, side-by-side, with two links connecting the switches, all in the same VLAN, if one link's ports are running STP, but the other link's ports are not, STP won't see the L2 loop, but you'll have one.

So, using VLAN trunk native, just as using access ports, does not, by itself, preclude mixing VLANs, and by doing so, eases (not insures) creating a physical L2 loop. Whether STP "sees" such a loop, is a different matter, and it might be more difficult to hide such from even PVST as its BPDU, themselves, don't seem to carry VLAN ID logically.

a1111 · ‎02-05-2025

Hello again,

Thank you again for the response. I believe I can articulate a scenario where there could be a loop:

SW1 is configured with a native VLAN of VLAN 1. For VLAN 1, SW1 is configured with an STP priority of 4096.

SW2 is configured with a native VLAN of VLAN 2. For VLAN 2, SW2 is configured with an STP priority of 0. For VLAN 1, SW2 is configured with an STP priority of 32768 (default value).

On the trunk link between SW1 and SW2, SW1 keeps receiving two kinds of BPDUs from SW2 relating to VLAN 1:

First kind of BPDU: SW2 sends an untagged BPDU. SW1 believes that this BPDU is in VLAN 1. That's because SW2 didn't tag the BPDU, and the native VLAN on SW1's end of the trunk is VLAN 1.

SW2's STP priority in this BPDU is lower than SW1's. Therefore, SW1 will consider SW2 as the root bridge. As a result, for VLAN 1, SW1 will advertise SW2 as the root, with the priority of 0.

Second kind of BPDU: SW2 sends a BPDU tagged with VLAN 1. SW1 believes that this BPDU is in VLAN 1. That's because SW2 encapsulated the BPDU in a VLAN 1 dot1q tag.

SW2's STP priority in this BPDU is higher than SW1's. Therefore, SW1 will be the root bridge. As a result, for VLAN 1, SW1 will advertise itself (SW1) as the root, with the priority of 4096.

What all of this means, if I'm not mistaken, is that SW1 will flip back and forth between being the root bridge and not being the root bridge for VLAN 1. What this could result in is that the other switches could recalculate their STP topologies for VLAN 1, because they would also flip back and forth between SW1 being the root bridge and not being it.

And this is why I said that this could go on infinitely.

(I mean sure, since the devices are physical, they'd eventually stop working. But this process would continue without stopping unless some unrelated force would intervene. E.g. a bug, or power outage, etc.)

So perhaps even the trunk link between SW1 and SW2 would flap due to the port role changes of SW1, but I'm not sure about that. However, what seems certain is that the other switches may end up recalculating their topologies because SW1 would flip back and forth between being a root and non-root bridge for VLAN 1.

What do you think? Thank you for reading my long comment. I hope to read from you more.

I hope the color coding didn't bother you. For me, it's easier to confuse the switches without it.

Have a nice day.

EDIT: So this wouldn't be a data loop, since due to the STP calculations, user data would not traverse the network. So STP loop is perhaps a better term.

Joseph W. Doherty · ‎02-06-2025

I'll have to think through your example, but off-top-my-head, unclear the root would flap. Change, sure, but flap?

If you intermix two VLANs, root would be selected based on root selection rules.

I'm thinking this is a bit like the classical STP issue, a new switch joins the L2 domain with an incorrect root priority and becomes the new root. The two issues this caused are the initial STP domain convergence and a L2 topology not as intended. Don't recall continued root flapping.

Joshqun Ismayilov · ‎02-05-2025

@a1111

A native VLAN mismatch can cause a Layer 2 loop because Spanning Tree Protocol (STP) BPDUs (Bridge Protocol Data Units), which control loop prevention, are typically sent on the native VLAN untagged. If two switches have different native VLANs on opposite ends of a trunk, BPDUs from one switch (sent untagged) may be incorrectly associated with a different VLAN on the other switch.
This can result in:

Blocking or forwarding inconsistencies – One switch may think a link is forwarding, while the other may not recognize the BPDU properly, causing a loop.
Duplicate STP instances – If STP fails to detect a loop due to BPDU misinterpretation, redundant links might not be properly blocked.
Traffic leaks between VLANs – Misclassified untagged traffic may circulate improperly, leading to instability.
Ultimately, STP cannot function correctly, and redundant paths that should be blocked remain open, causing a Layer 2 broadcast storm or loop.

Thanks !