cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10499
Views
16
Helpful
48
Replies

BGP route-reflectors and MPLS - suboptimal path.

Hello everybody,

I'm quite lost and need some good advices about my network topology.

Please have a look at a  picture in the attachment.

We have 4 routers physicaly  connected in a ring, three of them have a eBGP session with a upsteam ISP.

Two RC-RR's are route-reflectors and all other routers have  BGP sessions with them using   Loopbacks IP as source. 

Because of speed and price the connection RC-E001 <--> RC-RR1 is a backup and OSPF and BGP metric are set accordingly.

The internal routing are working as expected.  All routers are MPLS "P" routers, but only  Loopbacks IP are label-switched, it means that only traffic to a Loopback follow label-path, other traffic should use normal routing table.

The problem is followin: Traffic to the Internet  from the router RC-E001 follows the path RC-E002 ---> RC-RR2 ---> RC-RR1,

but it should just go to the router RC-E002 and then directly to the Internet.  All external prefixes on RC-E001 have RC-RR1 as a next-hop (higher local-preference)

Traceroute on RC-E001 shows following:

RC-E001#traceroute 8.8.8.8    

  1 RC-E002 [MPLS: Label 202 Exp 0] 16 msec 20 msec 60 msec

  2 RC-RR2 [MPLS: Label 79 Exp 0] 20 msec 16 msec 20 msec

  3 RC-RR1 [AS UPSTREAM] 20 msec 16 msec 20 msec

  4 UPSTREAM [AS UPSTREAM] 20 msec 16 msec 20 msec

  5 ....

I understand that RC-E001 tries to reach  the BGP next-hop via MPLS label-path, bacause all Loopbacks should use MPLS Label path-switching, but I don't want that the traffic goes in such sub-optimal way.

What have I configured wrong and what should I do to force  the traffic  from RC-E001 goes out direct  from RC-E002?

Best regards,

Konstantin

48 Replies 48

Hi Konstantin,

Yes, you can add a level of hierarchy to route reflects where one route reflector is the client of another BUT just because you can bend the rules of iBGP split horizon like this doesn't mean you should and I don't believe it is required, nor is it good practice.

It doesn't matter that there is no iBGP between E001 and E002, they should both be clients of the same RRs and the route will propagate via both RRs and back down. Unfortunately because these RRs also have their own route with better attributes the route will never get advertised, this is why setting the local preference was mentioned.

I'm sure your issue is with route selection on the route reflectors, ultimately there are many ways to solve any problem like this.

Matt,

It doesn't matter that there is no iBGP between E001 and E002,.....

I respectfully disagree here .It does matter if you read my scenario.  I mentioned a scenario in my first post. If that case ever  happens then you do need iBGP between E001 and E002

I don't think the problem is only BGP related, as you said previously, it's a LSP. you can see in trace:

Traceroute on RC-E001 shows following:

RC-E001#traceroute 8.8.8.8    

  1 RC-E002 [MPLS: Label 202 Exp 0] 16 msec 20 msec 60 msec

  2 RC-RR2 [MPLS: Label 79 Exp 0] 20 msec 16 msec 20 msec

  3 RC-RR1 [AS UPSTREAM] 20 msec 16 msec 20 msec

  4 UPSTREAM [AS UPSTREAM] 20 msec 16 msec 20 msec

  5 ....

that the traffic are label-switched to the next-hop and neither RC-E002 nor RC-RR2 don't "route" the packets  they just forwards them to RC-RR1

It's quite clear that in order to use RC-E002 as egress router for traffic from RC-E001 we need that RC-E001  is seeing  RC-E002 as next-hop, it means we need a BGP session between them.

The question is now - how to get it in a best way.

@Kishore - I don't disagree with you at all, we are just talking about different topologies :)

@Konstantin - you don't need a BGP session between E001 & E002, the fact that we are adding sessions to solve the problem proves that either

- the BGP best route selection is not being controlled and the desired route is not advertised

- the BGP topology does not equate to a full mesh (or equivalent with RRs)

How about try this, let's simplify the network for now. Make only one router a RR, it can be any and will need a session to each of the othe three. Now have all routers inject routes into BGP and use local preference (at the ingress) to control the best route then view this on the RR.

The label switching behaviour is just a side affect of the best BGP route being the one you don't desire.

One you've got that workin with one RR you can make a second one for redundancy as described in my earlier post.

Matt,

we can completly throgh away the RC-RR2 fo example, it doesn't play any role here, there is still a link between RC-E002 and RC-RR1. 

Then we get that both RC-E001 and RC-E002  have only one BGP session with RC-RR1.

RC-E002 has an Upstream link, RC-E001 doesn't have

All external prefixes on RC-E002 have a upstream router IP as hext-hop IP

on RC-E001 all external prefixes have RC-RR1 as hext-hop IP, because there is only one BGP session.

RC-E001 should actually use RC-E002 as egress point, but  in my topology it uses RC-RR1 because MPLS forwards the traffic direct to RC-RR1 and RC-E002 doesn't make any routing desicion. 

Update:

Without MPLS the trffic from RC-E001 would be simply routed by the RC-E002  and sent directly to Upstream router.

Konstantin,

BGP advertise only one best path. This is true for RRs as well.

This means that RR client receives only best routes from RRs and it's RRs vision of routing table.

If RR has eBGP routes it prefers them over iBGP by default and all RR clients will use this routes.
You can find this draft useful http://tools.ietf.org/html/draft-walton-bgp-add-paths-06.

Hi Sergey,

it's correct, I know it, but the problem not in the best-path.  The problem is how  to reach the next-hop from RR-client in case of MPLS and in case of non-MPLS backbone.

In my topology I've broken one of  RR topology  best practice - never set  a BGP session from  RR-client to a RR-router over another RR-client. In case of non-BGP backbone it's not so obvious but MPLS backbone has showed me that this rule makes sence.

In regards to your last post you're saying;

"The problem is how  to reach the next-hop from RR-client in case of MPLS and in case of non-MPLS backbone."

What we are trying to say is that one RRClient can't learn the next hop of the other RRClient due to BGP best path selection on the RR and that this need to be controlled so that the route with the next hop of the desired RRClient is advertised. Yes, the behaviour changes when you introduce MPLS, but actually this is BGPs fault and only working without MPLS because IP routing is decided hop by hop. Really the issue here is a failure in the BGP design, MPLS forwarding is just the victim protocol.

I think just about everyone on here is trying to give you more or less the same solution;

- I am saying control best path selection with LP

- Sergey is saying the route isn't advertised due to eBGP > iBGP (and hence is not best route)

- Kishore was asking you to move the RR to E002 which would just be another way to control the best path without setting LP

- Varma was talking about LP as well

How about this, prove us wrong. Post the "show ip bgp summary" and "show ip bgp " from each router and explain how the suboptimal routing is being caused by anything other than the BGP best path selection on the RR.

You haven't mentioned it but if you want the RR to be either of the current ones while allowing E001 to reach the Internet via E002 while maintaining that RR1 and RR2 still use their directly connected egress, you'll need to do something a little more complicated, either;

- change the node which acts as the RR to E002 so that the only path E001 learns in a stable topology is via E002

- change the node which acts as the RR to E001 so that E001 learns all paths and decides hopefully using lowest cost IGP metric or perhaps LP (it's dangerous to let BGP decide on its own if you have your own policy in mind)

Is this why you don't want to change BGP? So that you don't influence other routers egress?

Matthew,

You summarized this well!

Changing the router acting as RR can help with eBGP routes learned by RR but at the same time suboptimal routing issue will appear at other node.

If the real network is the same as in diagram I'd prefer full-mesh with

next-hop-self and loopback source.

I echo Matt. What time did yo go to bed last night? You at chatswood?

BTW. Konstantin , you opened another thread for the same issue in WAN Routing and Switching. Not sure what you mean by that.

I've opened t a discussion about second level of route-reflectors, just to have more opinions.

nother "cross-post" to this discussion should be ignored.

Hi Matt,

thank you, I'm really appritiate  your and answer and input all others .

 Yes, the behaviour changes when you introduce MPLS, but actually this is BGPs fault and only working without MPLS because IP routing is decided hop by hop. Really the issue here is a failure in the BGP design, MPLS forwarding is just the victim protocol. 

I see your point and I 100% agree with it,  MPLS is forwarding the traffic exactly in the way how  BGP want it, in my case unfortunately a little bit wrong.

How about this, prove us wrong. Post the "show ip bgp summary" and "show ip bgp " from each router and explain how the suboptimal routing is being caused by anything other than the BGP best path selection on the RR.

"sh ip bgp summ" an all routers shows  two  iBGP sessions with both RR's and with Upsteam router.

on RC-E001 only two  iBGP sessions with both RR's.

"show ip bgp 8.8.8.8"  on all routers shows  that the best-path is via Upstream router.

on RC-E001 it shows 2 paths - via RC-RR1 and RC-RR2, and as weight parameter is higher for BGP session with RC-RR1, it chooses as best path. But because of IGP metrics the  RC-RR1 is reachable via RC-E002 and not directly via backup connection. 


You haven't mentioned it but if you want the RR to be either of the current ones while allowing E001 to reach the Internet via E002 while maintaining that RR1 and RR2 still use their directly connected egress, you'll need to do something a little more complicated, either;

- change the node which acts as the RR to E002 so that the only path E001 learns in a stable topology is via E002

- change the node which acts as the RR to E001 so that E001 learns all paths and decides hopefully using lowest cost IGP metric or perhaps LP (it's dangerous to let BGP decide on its own if you have your own policy in mind)

Is this why you don't want to change BGP? So that you don't influence other routers egress?

Just to clarify our topology;

Our topology looks pretty simple and straightforward, the RC-RR1,RC-RR2 and RC-E002 (and 5 or 6 more routers)

are our backbone routers with upstream eBGP and at the same time they are "P" routers for our MPLS network.

As Full-Meshed is not really possible in our case (too many BGP session)  RC-RR1 and RC-RR2 were choose as route-reflector for Internet routing (MPLS route-reflectors are outside of scope of this discussion) because of their location and performance. All routers have a direct "physical" connection to both of RR's. All routers should primarily use their own upstream link for external communication.

sometimes ago we've added RC-E001 in our network, but RC-E001 doesn't have an upstream and it's more or less stub router but it still needs full-BGP table. it has a direct physical connection only to RC-E002 (primary link) and to RC-RR1 (secondary link because of price and bandwidth). 

I can't simply just move one RR to RC-E002, it means to re-configure 10 routers.

I don't think it's a good idea to put a third RR in a network - it will unnecessary increase the number of routing information on all routers.

I can't configure RC-E001  as Route-reflector, because it's more like a "stub" router

Thank you all again for contributing!

Hello everybody!

thank you again for the particvipoating in the discussion.

After considering all possibilities, I find that introducing a new level of Route-Reflectors is not the best idea - it will unnecessary complicate the configuration.

I find the idea with next-hop altering is not so bad, the question is what would be the best way to alter it?  Als Incoming on RC-E001 and set next-hop RC-E002 and next in a list RC-RR1,RC-RR2?

or is it better to change the outgoing BGP updates on RC-RR1 and RC-RR2?

Thank you for comments!

maayre
Level 1
Level 1

Problem is if E002 goes down and the next hop doesn't change you will black hole traffic.

Only safe way to do it is on the advertising RR but you would need some kind of advertise/exist map to conditionally change the next hop.

Since you don't want to drastically alter the current design why not add an iBGP session between E001 and E002, very simple and no stress on RR hierarchy!

Hi Matt,

you're right with a route-map next-hop setting, I need then some kind of tracking.

I've came also to the idea of  iBGP between RC-E001 and RC-E002  but couldn't find any "pro and contra" about iBGP session between route-reflector clients, it seems that not so many people have tried this.