07-02-2009 09:40 PM - edited 03-01-2019 04:22 PM
Over the last few years MPLS VPN services have gained popularity as an alternative network connectivity transport option over legacy TDM networks. One of the most popular challenges with the MPLS VPN design is the Layer3 routing interaction between the customer network and the service provider routing. A common scenario is when there is a primary BGP path over the MPLS VPN network and a redundant routing path over a non MPLS VPN network. This is exposed in many networks that have an eBGP peering session with the MPLS VPN provider and routes are learned to remote locations but also have a backup path to those same locations over a redundant IGP path. Typically the IGP path learns the routes via a dynamic routing protocol such as EIGRP or OSPF. This TAC Tip describes how to configure the routing such that the preferred path is always selected in both the primary path failure condition as well as the reroute on primary path recovery. Typically with the default configuration the failover works to the backup IGP path. However, the problem comes when the primary recovers.
When an IGP (in this example OSPF) route is redistributed in to BGP it is considered locally generated by BGP and gets assigned a weight of 32768. By default, all routes received from a BGP peer are assigned a local weight of 0. When doing BGP path comparison weight is the first attribute compared. Therefore, if the same prefix must be compared, the locally originated prefix with the higher weight will be installed in the routing table based on the BGP best path selection process. Let's first walk through an example of how the problem surfaces.
Take this simple network example:
R1 is the end customer (CPE) router that has two parallel paths to reach the remote 192.168.1.0/24 subnet. One path is an OSPF learned path and the other is an eBGP learned route from the MPLS PE router over the MPLS VPN network.
When the MPLS VPN network is up the eBGP route is selected as the best path based on the higher administrative distance (20 for eBGP and 110 for OSPF).
R1#show ip bgp 192.168.1.0 255.255.255.0
BGP routing table entry for 192.168.1.0/24, version 1562
Paths: (1 available, best #1, table default)
Flag: 0x820
Not advertised to any peer
65000
172.16.56.6 from 172.16.56.6 (192.168.8.1)
Origin IGP, metric 0, localpref 100, valid, external, best
R1#show ip route 192.168.1.0
Routing entry for 192.168.1.0/24
Known via "bgp 65001", distance 20, metric 0
Tag 65000, type external
Last update from 172.16.56.6 00:08:14 ago
Routing Descriptor Blocks:
* 172.16.56.6, from 172.16.56.6, 00:08:14 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 65000
The OSPF learned route to 192.168.1.0/24 is there as a candidate path in the OSPF database.
R1#show ip ospf data router 3.3.3.3
OSPF Router with ID (5.5.5.5) (Process ID 1)
Router Link States (Area 0)
LS age: 225
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 3.3.3.3
Advertising Router: 3.3.3.3
LS Seq Number: 8000138E
Checksum: 0x3AF3
Length: 48
Number of Links: 2
Link connected to: a Transit Network
(Link ID) Designated Router address: 172.16.35.5
(Link Data) Router Interface address: 172.16.35.3
Number of MTID metrics: 0
TOS 0 Metrics: 10
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.1.0
(Link Data) Network Mask: 255.255.255.0
Number of MTID metrics: 0
TOS 0 Metrics: 1
Now assume the link to the MPLS VPN network fails and we lose the eBGP route. Under this condition the OSPF backup route will be installed in the routing table. Here is the routing table debug showing the backup OSPF route going in the routing table.
RT: del 192.168.1.0 via 172.16.56.6, bgp metric [20/0]
RT: delete network route to 192.168.1.0/24
RT: updating ospf 192.168.1.0/24 (0x0) via 172.16.35.3 Et1/0
RT: add 192.168.1.0/24 via 172.16.35.3, ospf metric [110/11]
R1#show ip route 192.168.1.0
Routing entry for 192.168.1.0/24
Known via "ospf 1", distance 110, metric 11, type intra area
Redistributing via bgp 65001
Advertised by bgp 65001 match internal external 1 & 2
Last update from 172.16.35.3 on Ethernet1/0, 00:00:09 ago
Routing Descriptor Blocks:
* 172.16.35.3, from 3.3.3.3, 00:00:09 ago, via Ethernet1/0
Route metric is 11, traffic share count is 1
At this stage the routing has reconverged to the IGP backup path and everything is ok. However, notice that the output above shows the route is being redistributed in to BGP. This is because the router is doing OSPF to BGP redistribution to get the local OSPF learned routes in to BGP in order to be advertised over the MPLS VPN network.
Here is the entry in the BGP table showing it is now locally sourced with a weight of 32768.
R1#show ip bgp 192.168.1.0 255.255.255.0
BGP routing table entry for 192.168.1.0/24, version 1564
Paths: (1 available, best #1, table default)
Flag: 0x820
Not advertised to any peer
Local
172.16.35.3 from 0.0.0.0 (5.5.5.5)
Origin incomplete, metric 11, localpref 100, weight 32768, valid, sourced, best
Now let's say that the primary link to the MPLS VPN router comes back up and the eBGP session recovers such that we learn the 192.168.1.0/24 network over the eBGP session again.
BGP(0): 172.16.56.6 rcvd UPDATE w/ attr: nexthop 172.16.56.6, origin i, metric 0, path 65000
BGP(0): 172.16.56.6 rcvd 192.168.1.0/24
R1#show ip bgp 192.168.1.0 255.255.255.0
BGP routing table entry for 192.168.1.0/24, version 1564
Paths: (2 available, best #2, table default)
Flag: 0x820
Advertised to update-groups:
1
65000
172.16.56.6 from 172.16.56.6 (192.168.8.1)
Origin IGP, metric 0, localpref 100, valid, external
Local
172.16.35.3 from 0.0.0.0 (5.5.5.5)
Origin incomplete, metric 11, localpref 100, weight 32768, valid, sourced, best
Even though the AD of the eBGP path (20) is lower than OSPF path (110), we do not install the eBGP learned route into the routing table. Since this prefix is in the routing table via OSPF and is being redistributed into BGP, the BGP table will have both paths and must use the Best Path Selection Algorithm. Routes redistributed into BGP are considered locally originated and get a default weight of 32768. The BGP learned prefix is assigned a weight of 0 by default. Since weight is the first BGP attribute that we compare on Cisco routers, the route with the higher weight is considered the best.
R1#show ip route 192.168.1.0
Routing entry for 192.168.1.0/24
Known via "ospf 1", distance 110, metric 11, type intra area
Redistributing via bgp 65001
Advertised by bgp 65001 match internal external 1 & 2
Last update from 172.16.35.3 on Ethernet1/0, 00:03:05 ago
Routing Descriptor Blocks:
* 172.16.35.3, from 3.3.3.3, 00:03:05 ago, via Ethernet1/0
Route metric is 11, traffic share count is 1
Now the problem is that, even though the BGP link is back up and we are learning prefixes, traffic is still routing over the backup path via OSPF. To resolve this, we need to force the eBGP path to be preferred.
One common way to resolve this issue is to set the weight on routes learned from the eBGP peer higher than 32768. When the paths are compared by BGP, the path with the highest weight will be preferred and installed in the routing table.
router bgp 65001
bgp log-neighbor-changes
neighbor 172.16.56.6 remote-as 65000
!
address-family ipv4
no synchronization
redistribute ospf 1 match internal external 1 external 2
neighbor 172.16.56.6 activate
neighbor 172.16.56.6 weight 32769
no auto-summary
exit-address-family
To update the weight on the received update, we must force the peer to send the update again so that we can apply the change inbound.
R1#clear ip bgp 172.16.56.6 soft in
*Feb 10 00:32:01.279: BGP(0): 172.16.56.6 rcvd UPDATE w/ attr: nexthop 172.16.56.6, origin i, metric 0, path 65000
*Feb 10 00:32:01.279: BGP(0): 172.16.56.6 rcvd 192.168.1.0/24
*Feb 10 00:32:01.291: RT: closer admin distance for 192.168.1.0, flushing 1 routes
*Feb 10 00:32:01.291: RT: add 192.168.1.0/24 via 172.16.56.6, bgp metric [20/0]
R1#show ip route 192.168.1.0
Routing entry for 192.168.1.0/24
Known via "bgp 65001", distance 20, metric 0
Tag 65000, type external
Last update from 172.16.56.6 00:01:06 ago
Routing Descriptor Blocks:
* 172.16.56.6, from 172.16.56.6, 00:01:06 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 65000
R1#show ip bgp 192.168.1.0
BGP routing table entry for 192.168.1.0/24, version 1565
Paths: (1 available, best #1, table default)
Flag: 0x820
Not advertised to any peer
65000
172.16.56.6 from 172.16.56.6 (192.168.8.1)
Origin IGP, metric 0, localpref 100, weight 32769, valid, external, best
To demonstrate how this works, let's assume that the BGP route has been lost and the OSPF route is installed in the routing table.
R1#show ip route 192.168.1.0
Routing entry for 192.168.1.0/24
Known via "ospf 1", distance 110, metric 11, type intra area
Redistributing via bgp 65001
Advertised by bgp 65001 match internal external 1 & 2
Last update from 172.16.35.3 on Ethernet1/0, 00:00:08 ago
Routing Descriptor Blocks:
* 172.16.35.3, from 3.3.3.3, 00:00:08 ago, via Ethernet1/0
Route metric is 11, traffic share count is 1
R1#show ip bgp 192.168.1.0
BGP routing table entry for 192.168.1.0/24, version 1567
Paths: (1 available, best #1, table default)
Flag: 0x820
Not advertised to any peer
Local
172.16.35.3 from 0.0.0.0 (5.5.5.5)
Origin incomplete, metric 11, localpref 100, weight 32768, valid, sourced, best
Once the eBGP peer comes back up, we learn the 192.168.1.0/24 again. Now we can see that the eBGP path is immediately installed in the routing table as the best
*Feb 10 00:37:33.259: BGP(0): 172.16.56.6 rcvd UPDATE w/ attr: nexthop 172.16.56.6, origin i, metric 0, path 65000
*Feb 10 00:37:33.259: BGP(0): 172.16.56.6 rcvd 192.168.1.0/24
*Feb 10 00:37:33.271: BGP(0): Revise route installing 1 of 1 routes for 192.168.1.0/24 -> 172.16.56.6(global) to main IP table
*Feb 10 00:37:33.271: RT: updating bgp 192.168.1.0/24 (0x0) via 172.16.56.6
*Feb 10 00:37:33.271: RT: closer admin distance for 192.168.1.0, flushing 1 routes
*Feb 10 00:37:33.271: RT: add 192.168.1.0/24 via 172.16.56.6, bgp metric [20/0]
R1#show ip route 192.168.1.0
Routing entry for 192.168.1.0/24
Known via "bgp 65001", distance 20, metric 0
Tag 65000, type external
Last update from 172.16.56.6 00:00:11 ago
Routing Descriptor Blocks:
* 172.16.56.6, from 172.16.56.6, 00:00:11 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 65000
R1#show ip bgp 192.168.1.0
BGP routing table entry for 192.168.1.0/24, version 1568
Paths: (1 available, best #1, table default)
Flag: 0x820
Not advertised to any peer
65000
172.16.56.6 from 172.16.56.6 (192.168.8.1)
Origin IGP, metric 0, localpref 100, weight 32769, valid, external, best
If you do not want to apply the weight to all updates received from the neighbor, you can use a route-map to change the weight for only certain updates from the peer. Please see the configuration example below.
router bgp 65001
bgp log-neighbor-changes
neighbor 172.16.56.6 remote-as 65000
!
address-family ipv4
no synchronization
redistribute ospf 1 match internal external 1 external 2
neighbor 172.16.56.6 activate
neighbor 172.16.56.6 route-map set_weight in
no auto-summary
exit-address-family
route-map set_weight permit 10
match ip address 1
set weight 32769
access-list 1 permit 192.168.1.0 0.0.0.255
When the update is received, you can check the ACL for matches.
R1#show access-list 1
Standard IP access list 1
10 permit 192.168.1.0, wildcard bits 0.0.0.255 (2 matches)
- Brandon Patton, Cisco TAC
- Rodney Dunn, Cisco TAC
To receive the latest information on Cisco online tools, certifications, support documentation, insights from Cisco experts and peers, and upcoming events, check out the Cisco Technical Services Newsletter today.
Within the given example we get the required result by amending the BGP weight which makes the eBGP route received from PE1 preferred over the locally generated one, however this workaround will not work in all circumstances due to further dependencies.
For this solution to work PE1 must first advertise a route back to R1, which means we need to make sure PE1 prefers the primary MP-iBGP from R2 via PE2 over the eBGP route received from the directly connected R1.
Had both routes from R1 and R2 been the same from the BGP attributes perspective, PE1 would have preferred the eBGP route from R1 as eBGP is preferred over iBGP.
In this alternate scenario even though the weight it set on R1 neighbour towards PE1, R1 would never get the routes from PE1 as the PE’s local best route would be the one received from R1.
The reason the solution works in the given example is the origin code of the route advertised by R2 is different to the one advertised by R1.
It can be seen in the outputs R2 is advertising 192.168.1.0/24 with an origin code of IGP while R1 was advertising the route with an incomplete origin code -
R1#show ip bgp 192.168.1.0 255.255.255.0
BGP routing table entry for 192.168.1.0/24, version 1564
Paths: (2 available, best #2, table default)
Flag: 0x820
Advertised to update-groups:
1
65000
172.16.56.6 from 172.16.56.6 (192.168.8.1)
Origin IGP, metric 0, localpref 100, valid, external
Local
172.16.35.3 from 0.0.0.0 (5.5.5.5)
Origin incomplete, metric 11, localpref 100, weight 32768, valid, sourced, best
As origin IGP is better then incomplete PE1 best route would be R2’s one learned over MP-IBGP from PE2, hence the solution works.
In environments where the R2 origin code is the same as R1, further configuration (such as adding MED to R1 redistributed routes) is needed to make sure the PEs prefer the route coming directly from R2.
Gur Rotkop
CCIE
===Re: BGP Origin code affect on the suggested solution success -- Tpkirby
This is a very common issue and I like your explanation and solution. However I think you need to mention that your sample network has a few caveats:
1. Since you are using the same AS at both sites, the MPLS network must have AS Override turn on, otherwise you would never see the LAN route coming in via eBGP.
2. Based on your BGP config the MPLS cloud would contain two equal paths to the LAN subnets and thus load share under normal circumstances. Some traffic under normal conditions would go through R1 and across the backdoor link.
3. It might be valuable to discuss the scenario where the design is to use the OSPF link as the primary path, yet still maintain the MPLS-BGP path as a secondary choice. I have seen much confusion in that scenario as well since you need to modify the Admin distances of either OSPF or BGP. People are often fooled by the same effect of redistribution into BGP that you detailed in your example. The OSPF route is intially used, because typically it is seen before the BGP advertisements and the redistribution into BGP keeps the OSPF path as the best path, until OSPF route fails for some reason. The path will then switch to BGP and never goes back to the OSPF route because of eBGP's admin distance.
In cases of backdoor links like this I typically assign different AS number for the physical sites, and use MEDs or AS path prepends to eliminate the load sharing and create a primary/secondary path design. This of course does not change the fact that you need to adjust the weights or admin distances. I just like having the extra control of the seperate AS numbers and then I do not have to rely on the AS Override setting in the MPLS cloud.
Tim Kirby
This discussion is interesting and useful for one-router MPLS VPN / backup path issues.
It however doesn't mention the very common problem where there are two routers in the R1 position, one for the MPLS VPN and one for the backup link, call them R1A and R1B. In that case, R1A has an external BGP and an internal IGP route to the remote site, so eBGP wins there, as desired. The router R1B however typically has an internal IGP route and an external IGP route from redistributing the remote site route from eBGP into the IGP. The internal route wins, no matter what the metric is.
If the campus is big enough to have L3 switches behind R1A and B, then you have to find a way to get them to prefer A to B for remote destinations. Or get B to route via A to remote destinations.
A lot of sites seem to be doing iBGP or eBGP on the backup link for that reason. Then iBGP between the two campus edge routers. Both have their pros and cons. Among them, the BGP deployment is becoming more like a WAN IGP at that point, which isn't great. Convergence and complexity, also not good.
Altering admin distance isn't a very clean solution, can lead to other problems (routing loops), and doesn't scale well.
Small sites can run their IGP over mGRE on the MPLS side (using the provider BGP just for the CE-CE connectivity), and run the IGP over DMVPN or GRE/IPsec on the backup side (typically it is IPsec backup, lately).
Larger sites can redistribute a static summary into the IGP, and restrict more-specifics from the campus. With the two edge routers R1A and B advertising an external summary, metric comes back into play. In this case, R1A and B need to exchange more-specific prefixes to handle spot MPLS outages properly.
I don't have time to reconstruct the various alternatives here -- it's a good CCNP to CCIE level challenge for the reader -- how many ways can YOU solve this problem!
Is the same issue comes with eigrp also.
regards shivlu jain
It would make more sense to me to fix this problem with an all-BGP answer.
First, add an iBGP adjacency between the CE routers over the LAN path. Then use a distance command in each CE router to make eBGP and iBGP the same administrative distance (20).
Once this is done, the CE routers will use only BGP path selection information to select both the primary and secondary preferences for any destinations announced by the IGP amd BGP. The IGP is no longer involved. The IGP can remain on the LAN -- there may be destinations that are reachable via the IGP only, and the IGP may be needed to deliver packets between the iBGP neighbor addresses.
With BGP in charge of everything, controlling the primary path preference is easy using a weight command in either CE router (once iBGP and eBGP are set to the same administrative distance value). AS override can reamin in the service provider network.
Also, I never recommned redistribution into BGP. Use BGP network commands or redistribute connected.
should we also have scnario where IGP backdoor is preferred as per Tim's scenario? Admin distance can fix but how is weight contribute to the solution?
"First, add an iBGP adjacency between the CE routers over the LAN path. Then use a distance command in each CE router to make eBGP and iBGP the same administrative distance (20)."
I was under the impression that when a router learns both eBGP and iBGP routes the decision is made in the BGP table first so the AD doesn't come into it.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: