cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
713
Views
0
Helpful
0
Replies

QinQ: Unexpected STP convergence or MAC leaning/forwarding issue using redundant customer links

Dan Weber
Level 1
Level 1

I am trying to solve a problem of extending multiple switching layers between three buildings as part of a Data Center build/move. We have private single mode fiber between all three buildings that will be in place before the move. The distance between the three buildings will not support Cat6 copper.

The switching layers that I need to extend are primarily different layers of the internet edge and dmz for both production and test. Each layer is a pair of stand alone switches in the DC or building 1 with a 2x1G port-channel between. In most cases, we do not have any SFP ports and need to rely on TX copper ports for extending the layers to the new buildings.

In the attached diagram, building 1 is the existing building. B1-SW1 and B1-SW2 represent one of the switching layers. I want to extend the layer as if I directly connected B1-SW1 to B2-SW1 and B3-SW1 and B1-SW2 to B2 and B3 SW2. 

I thought 802.1Q tunneling (QinQ) would be a good fit for this solution as it would allow me to connect via copper in each building and then use the 10G interfaces on the 3850 for form port-channels between them to carry the building (customer) traffic. Additionally, QinQ would allow me to use the same solution for production and test at the same time even with vlan overlap. I planned on logically extending the links between SW1s using v3001 and the links between SW2s using v3002. 

I configured a 3850 stack for each building and formed the port-channels between them. Next I increased the MTU on the 3850s to 1516 and also configured them to tag native vlans. 

I was able to get everything up and working and began pinging from R1 in building 1 to R2 in building 2 using vlan 992 on the building switches. 

The issue I have is that while doing fail-over testing (pulling link1 or link2 in the diagram) I noticed that usually the fail over to the alternate links via SW2 worked fine but occasionally I would see packet loss for anywhere from 30 to 180+ seconds even after spanning had apparently converged.

I suspect the issue I am having is related to the interaction between the 3850 QSW spanning tree on v3001 and v3002 and the upper layer spanning tree on v992. Or it may be related to how the 3850 stack learns the same MAC in both vlans.

To simplify a bit, I removed building 3 from the network but I still have the same issue with packet loss.

Out of desperation, I disabled mac learning on v3001 and v3002 and now I see consistent behavior between building 1 and building 2 but I suspect I cause issues if I bring building three back into the v3001/3002 loop.

I'm hoping someone has experienced this problem before while extending L2 traffic between Data Centers using QinQ and redundant links and may have found the solution or a limitation with the design.

IOS version on the 3850s is  Version 03.06.05E - IPBASE

Looking forward to the discussion.

Dan Weber

0 Replies 0