on 03-31-2011 08:57 AM
In this document the concept of MST (multiple spanning tree) access gateway and the difference to regular MST will be highlighted. Also it shows some design scenarios of when to chose for which particular implementation.
It is generally well understood that in a layer 2 network loops are disastrous. Spanning-Tree Protocol (STP) aids in the detection of loops and breaking that loop to prevent broadcast and unknown unicast packets for circuling around for forever and bringing down the network eventually.
STP operates by selecting a root bridge in the network, who will have all its ports in forwarding and the other switches determining their path cost towards the root switch.
The way the root switch is elected is by means of a priority and when all are equal the lowest switch mac address is selected to be the root. While in default settings a loop is prevented, for good network design it is important that the priorities are set correctly to define where the root switch (and its potential backup) are going to be located.
Root switch election
The states that a port can be in are determined as follows:
Just as a summary overview the follow topology:
In this example we consider this picture after all BPDU's have been exchanged. Based on priority and mac address S1 is elected as root bridge.
S1 will move both his ports from blocking to forwarding.
Switch S2 will move his root facing port in forwarding mode, which will become the RP (root port).The Port towards S3 will also be in forwarding
because S3, although having a direct link to the root switch S1, the path cost is higher (100) then going via S2->S1 (38). For this reason
the port on S3 to S1 will be blocking.
S4's path to the root bridge is either via S2 or S3. The path to S2 is a higher cost then via S3, ehcne S4->S2 is blocking and the path
to S3 is chosen and that port on S3 will become the RP.
This picture assumes that there are no vlans and just plain ethernet, also this diagram shows a local area network.
What if we want to use multiple vlans or interconnecting this network via a carrier ethernet network to a remote LAN?
In the scenario where you have multiple vlans, regular STP will block the link for all vlans. While this prevents the loop, it is maybe not that
efficient as one node/path is completely in standby mode. It might be nice to forward a few vlans over to switch 1 and the others to switch 2.
Effectively that means that Sw1 is then root for a vlan set and Sw2 for another vlan set.
Regular STP cannot do this, and the logical evolution of htat is MSTP (multiple spanning tree) which is hte more standardized version, and PVST(+) which is a cisco proprietary solution.
They effectively achieve the same PVST and MSTP, one of the key differences is that MSTP sends the bpdu's out untagged on the port, where
PVST sends the bpdu's inline with the vlan, hence are vlan tagged.
The most common way to connect 2 separate l2 segments or networks together is via VPLS.
With VPLS the edge nodes from the provider are aggregating the customers L2 traffic, and participate in the L2 spanning tree loop prevention as well as Pseudo Wires over an MPLS core to remote PE's (Provider Edge) to bring the traffic from one segment to the remote site.
The way VPLS works and the interaction with MST or MSTAG and what the differences are will be discussed below.
MST allows us then to run an STP instance per set of vlans that we can configure.
A sample configuration on the ASR9000 PE node looks as follows. Each set of vlans is defined under an instance.
We can adjust the priority per instance if needed, multiple vlans per instance are allowed also.
Sample MST configuration on the ASR9000
spanning-tree mst MYSTP_DOMAIN
name testme
! The name of the MST region is very important, it must be the same for all switches in this region.
! also the definitions of your MST instances need to be the same on all nodes.
revision 1
instance 0
priority 4096
!
instance 1
vlan-ids 100
priority 4096
!
instance 2
vlan-ids 101
priority 4096
!
interface TenGigE0/3/0/6
! Interfaces that are enabled for MST. Note that these are the main interfaces
interface TenGigE0/3/0/7
!
interface Bundle-Ether100
!
interface GigabitEthernet0/0/0/27
!
!
To provide a more graphical example of how MST can be used with 9k's is shown here:
In this case we have a clear STP loop between the 2 9K PE devices interconnceted via a bundle ethernet.
Associated configuration for this example would be as follows:
spanning-tree mst Example
name testme
revision 1
instance 0
priority 4096
!
instance 1
vlan-ids 100
priority 4096
!
interface Bundle-Ether100
!
interface GigabitEthernet0/0/0/27
!
With this config we are providing the ability to the 9k PE to send bpdu's out to the other 9k and the access switch.
Also with this config, one link will be blocked and if we elect either 9k to be the root switch (say the one on the left), the link marked RED will be blocked.
This config however doesn't provide for any data configuration forwarding. For this we need to establish a separate bridge-domain whereby we pull in the right EFP's for forwarding the data traffic:
interface bundle-e100.100 l2transport
encapsulation dot1q 100
rewrite ingress tag pop 1 symmetric
!rewrite is optional depending on whether all EFP's are in the same vlan.
interface gigabitethernet 0/0/0/27.100 l2transport
encapsulation dot1q 100
rewrite ingress tag pop 1 symmetric
!rewrite is optional depending on whether all EFP's are in the same vlan.
l2vpn
bridge group EXAMPLE
bridge-domain FWD_1
interface g0/0/0/27.100
interface bundle-e100.100
You need to repeat this configuration for every vlan you want to forward. There is another article detailing more about vlan rewrites and the EFP concept in case you're interested. See the related documentation section for a reference.
VPLS is the concept of connecting multiple layer 2 domains over an MPLS network for instance. On the ASR9000, a bridge-group is used to pull in the attachment circuits (physical interfaces towards the lan segment) and Pseudo Wires (PW) to the remote PE's.
A sample configuration achieve vpls looks as follows, this provides the configuration for the data plane and assumes there are no loops in your L2 topology either at the customer access site or within your VPLS domain.
To prevent loops in the access network we need to leverage either MSTP, MSTAG or PVSTAG. We'll discuss MSTAG in the next section below.
l2vpn
bridge group VPLS
bridge-domain vpls_1
! the bridge-group vs domain is just a configuration hierarchy, it doesn't serve any special functionality.
interface GigabitEthernet0/0/0/0.100
! Phyiscal interfaces towards a subscriber switch
neighbor 1.1.2.3 pw-id 123
! for H-VPLS we can use PW's also as an attachment circuit
vfi vpls_1_vfi_1
neighbor 5.5.5.5 pw-id 333
! definition of a pseudo wire underneath a Virtual Forwarding Instance
neighbor 6.6.6.6 pw-id 444
!
!
!
!
!
end
The IP address providing in the "neighbor" statement are the MPLS router ID's from the remote PE's. the PW-ID is an arbitrary number, unique, that defines the VC label.
Whether you put the PW's in the VFI or outside the VFI or across VFI's depends on your needs and whether you need SPLIT HORIZON (see below).
The following picture explains what might happen when we don't use any STP in a VPLS scenario.
In this case there is a loop, but the access switch doesn't know about it because both the 9k PE's, nor the switches
form a closed ring.
Even if we'd be running MSTP in this scenario there is no loop detected, since by default BPDU's are not forwarded over the pseudo wires.
A potential solution might be to run an L2 link between the 2 southern PE's so that there is a loop on the SOUTH segment and indeed one UP link will be blocked from the access switch to one of the 2 PE's as per regular (M)STP.
The problem is here however that a (broadcast/unknown unicast) packet arriving on the Left South PE's pseudo wire is now then sent to the access switch south AND over the interchassis link (not drawn in this picture) to the SOUTH PE on the right. There will be a loop again.
A proper solution for this model is the use of MST Access gateway which will be highlighted below.
Normally in a bridge domain, broadcast and unknown unicast from Attachment circuits are replicated to all bridge ports.
Obviously packets are never sent to the AC or PW that the traffic was actually received on.
By default AC's can always forward packets to each other and to (all) Pseudo Wires.
So traffic from the AC "west" will be replicated over all PW's and the South-West AC.
When traffc arrives on a PW then by default packets are never sent out the PW that they are received on and to other Pseudo Wires in the same
VFI. All PW's in the same VFI share the same split horizon group. And traffic is not replicated within the same split horizon group.
When packets arrive on the Pseudo Wire they are NOT forwarded out of PW's in the same VFI.
You can also move Attachment Circuits into a split horizon group to prevent them from speaking with each other by means of the "split horizon" group command underneath the interface which is configured in the l2vpn bridge-domain.
l2vpn
bridge group TEST
bridge-domain SAMPLE
interface g0/1/0/0
split-horizon
interface g0/1/0/10
split horizon
interface te0/2/0/3
vfi VFI_TEST
neighbor 2.2.2.2 pw-id 100
In this case traffic cannot flow between 0/1/0/0 and 0/1/0/10.
For illustrational purposes consider this sample VPLS design.
In order for MST to work, we need an inter PE attachment circuit to exchange the BPDU's between the two PE nodes drawn in blue.
In this given example the PE_left (PE-1) is considered ROOT.
Imagine there is a broadcast coming in on PW-1. Because of split horizon the traffic is not replicated to the PW 2, 3, 4 and 5. But traffic will go down the AC-2 and also sent over to AC-1.
When traffic enters the PE_right (PE-2), it will not go down AC-3 because it is blocking since PE-1 is root, but it will enter the VFI and gets replicated to the PW's in the VFI there so PW 3, 4 and 5. This poses a big problem considering with a replication loop back to PE-3 and also to PE-1.
Omitting PW-5 solves part of the issue so that traffic is not replicated back to PE-1, but it might slow down convergence in case AC-2 is going offline and PE3/4 have not yet updated their mac tables yet.Traffic will still get back to PE-3 and PE-4.
The AC-1 is required for BPDU, and you might want to consider only creating an EFP for hte untagged traffic (BPDU's), but then you might have also forwarding issues in case the PW 1 and 2 are down and you want to send traffic over the AC 1 to PE-right so it can forward the traffic for us to the PE3 and 4.
Wouldn't it be nice if we can live without the AC-1 all together, still run spanning-tree to the CE and have optimum convergence?
Yes. Enter MSTAG.
In MSTAG we define the BPDU's on the PE nodes statically presenting them as 1 virtual bridge to the CE.
One link will be blocked, but there is no need for an inter chassis link anymore in this case.
What’s the main function of the MST access gateway?
Major Advantage – scale and local significant
Light MST implementation, for example, it doesn’t keep STP state machine, it doesn’t need to handle received BPDU (except TCN)
The MST is per port scope
Other Advantages
Doesn’t require inter-PE special PW, no single point of failure, no temp L2 loop.
Much robust than the “MST over special PW” solution
Standard based solution, inter-operable with 3rd vendors, work with any network topology
Self protection, even with user mis-configuration, it won’t cause L2 loops
Disadvantages
MST convergence depends on the number of VLANs in the access ring and the MST implementation of the access switches. In any case, don't expect 50msec convergence time
With Cisco 3400 as access switch, the baseline convergence show sub second for link failure, sub 100msec for link recovery, 2-3 seconds for node failure
Note that in this configuration we use the interface with suffix .1 in the MSTAG configuration.
This means we need to define an EFP (Ethernet Flow Point) to capture the BPDU's and TCN packets. In fact, we're not even using the bpdu's received, as we perceive ourselves to be root on the 9k and send these precanned BPDU's out.
We will consume the TCN (topology change notification) and send these into the VPLS network as mac withdrawl messages.
interface gigabitEthernet 0/0/0.10.1 l2transport
encapsulation untagged
Aside from the MST configuration we still need to configure our bridge domains with the EFP's for the data forwarding and our Pseudo Wires to our remote PE's as described above.
1) MSTP: There is a single protocol instance which can have the
standard 64 MST Instances (MSTIs) within it. These 64 MSTIs create 64
logical spanning tree topologies within one MSTP region/ domain.
2) MSTAG: You can create a separate protocol instance per physical
interface and each each protocol instance can be in a separate MSTP
region by itself and each one can in turn support 64 MST Instances
(MSTIs) within it.
In general MSTAG is more scalable (multiple regions with 64 MSTIs each)
but can only be used if the ASR9K is in the root (or backup root)
position for every MSTI. MSTP is the normal Cat 6K like version but you
can use all 64 MSTIs without any issues. Both of these can interoperate
with any IEEE standard MSTP implementation so should work with the N7K
VDCs.
For more details on regular MST and some IOS interoperability considerations, check this reference:
ASR9000 MST interop with IOS/7600: VLAN pruning
Learn more about vlan rewrites and the concept of EFP's
ASR9000/XR Flexible VLAN matching, EVC, VLAN-Tag rewriting, IRB/BVI and defining L2 services
Xander Thuijs, CCIE #6775
Sr Tech Lead ASR9000
Alexander, thanks again for your answer! My big doubt it's what will happen when the A9K receive the STP TCN, It will send the MWD ldp message over all the pseudowires, or only over the PWs that are in this bridge-group? Thanks for your collaboration!
Pablo
Hi Pablo,
For MSTAG, a TCN in any MSTI will trigger a MAC flush in all BDs.
For MST, a TCN in an MSTI will cause a MAC flush in the associated BDs (ie the BDs containing EFPs that match any of the VLANs assigned to the MSTI).
Also the MWD message is a "wildcard" operation which means we flush all macs with that message, we don't do intelligent withdrawel (yet) to only "revoke" the mac's associated with the EFP that received the TCN.
It should not be a worry, this functionality has been around since day 1 for the A9K and has operated to perfect perfection.
regards!
xander
Thanks for your answer! It's only to take this behavior in mind!
Hi Xander,
I read the document which you suggested. I need to clarify some doubts. Suppose if we have 5 PE routers which shares same vpls service, should we configure the same bridge-domain name in all these routers ? May I know the exact difference between bridge group and bridge domain ? Along with that I am confused about the following terminologies "ve-id, vpls-id , vsi-id, vpn-id"
Because in different documents they are using different terminologies. In one document they specified VC label is used to identify the pwseudowire(each PWs consisting of a pair of unidirectional LSPs or VC labels) and a VPLS instance identified by Service-identifier(Svc-id) .May I know the real difference between these terminologies. Please explain with a real time example if you can . It will be helpful for me and all others also. Could you please mention the exaple of a vpls instance and service. This is a humble request please do the needful.
Manu,
According to my knowledge bridge domain acts like a bridge. What ever data coming under this will propagate to all of its physical and virtual interfaces. PEs must implement a bridge for each VPLS instance(identified by vpls-id). If you have 5 PE which has same vpls-id shold be configurred with same bridge-name. I don't have that much sure about other terminologies.
Hi Alexander,
I have read this document. This is a good one. One thing I need to know "is MSTAG a standared protocol like MST ?" . Can we run MSTAG in muti vendor devices ? Please give me a confirmation.
Alenxander Hi, great article, I have a question, I have a scenario as in the diagram, in the ring 1 and 2 does not change the root of SP when the MPLS backbone have a problem, ergo in the Rings L2 traffic is blocked, because MSTAG messages to continue, and customer A does not pass between the points.
How I can fix this?
Thank you very much, your notes are very useful
Hi sergio!! nice to hear thank you!
So for your sitaution there are a few options
1) the automated way of a core failure sending a message down the mst ring. that is a feature request that we dont have yet.
2) using EEM: syslog track the interface down core, and shut the interface or do something in the mst configuration to signal the ring
3) not worry about it and build a connection between teh 2 AG nodes so there is comm possible between the two AG nodes in case something is received on one node that needs to go to the other.
cheers!
xander
Thanks very much! I think will go for the option number 2
Hi Xander,
Thanks.
Hi xander,
why do you need a special pw in 7600 MSTAG setup to relay the TCN between the PE devices while we don't need this in ASR9K?
Regards
hey there!
Ah I have some Q&A on that for you!
1. Configuring MSTAG between a pair of 9K's, do we need this special bpdu PW? Or is the regular VPLS PW enough?
For MSTAG, you don't need connectivity between the 9K's at all, unless both these conditions hold: a) There is more than 1 access device b) The access devices send traffic to each other (not just to/from the core). If both these conditions hold, then you need to connect the 9Ks together so that they can propagate topology changes to each other, and hence to the access devices. You can do this by placing the untagged EFP where MSTAG is configured in a cross-connect with a PW (or another AC) that is linked to the other 9K. There is nothing special about this PW (or AC).
2. How are the BPDU relayed between 9k's ? LDP maybe? if we don't use this special PW?
Unless the conditions above hold, there is no need to relay BPDUs between the 9Ks when using MSTAG. They only need to be relayed to propagate topology changes.
3. If the regular PW between nPE 9k's is shutdown, why doesn't the access layer reconverge? I see no change, precanned bpdu still get sent.
With MSTAG, the precanned BPDUs will always continue to be sent, all of the time. MSTAG works by sending precanned BPDUs, not by relaying BPDUs from one device to the other like the 7600 solution.
Revisiting an old discussion, with this topology:
So, one ASR9k (XR 5.1.x), with two links to different nodes of the access ring. Assume the access ring is controlling loops through some non-STP fashion, and cannot be configured for STP. What are the options for configurations on the ASR? Is it as simple as split horizon on the two links?
Thanks,
ERM
hey evan,
there is a loop and the 9k participates in it, you will need some sort of STP like loop prevention in which the a9k particpates. For that the a9k has the following options:
REPAG, MST(AG), PVST(AG), 8032.
If the switches can't do any of them, we have a problem :)
xander
hello Alexander,
we have this current design in our backbone ( REP / L2VPN ASR )
see design.jpg
Goal is to deploy a Merge of the two infrastructure in order to exchange traffic for some vlans as below :
in this configuration, do we need to implement MST or something else on the ASR side in order to work with REP configuration ?
if link is down on REP as on the picture, could we have traffic for all REP switchs with traffic passing throught ASR Core ?
thanks in advance
Philippe
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: