cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1036
Views
5
Helpful
16
Replies
Highlighted
Beginner

BGP routes when OSPF routes are unavailable

Hello,

Here is my simplified network layout (AS and IPs are made up for privacy):

As you can see, we own one /20 and we have two locations connected to two different Internet peers. Network 1 is announcing the whole /20 and Network 2 is announcing the same IP space broken up in two /21. The more specific BGP announcement wins. Both locations are also connected via a Metro Ethernet connection and both sides are part of OSPF area 0. Both locations are also BGP peering with each other. Routing between both locations works fine, so does the BGP routing to our peers. Should Internet Peer 1 fail, traffic makes it just fine to Network 1 via Internet Peer 2 and it's connected Metro ethernet. So, basically, everything is working great as long as the Metro Ethernet link between Network 1 and Network 2 is up. If the Metro Ethernet connection fails, I can no longer reach resources on the other network. The plan is that traffic should go over the internet if the Metro Ethernet connection is down. When I do a traceroute (while the Network Ethernet is down) from Network 1 to Network 2, the route goes to Null 0. Since OSPF can't communicate with the other network traffic destined to the other network goes to Null0. However, BGP still has a valid "default route" to the internet. Also, please note that traffic originated from the internet and destined to other network, will reach the destination just fine.

The question is: What do I need to do so that I can still reach the other network from each location when the Metro Ethernet connection fails?

I can post configs if needed.

Thanks for your help

JB 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

JB

 

Thanks for the additional information. I certainly understand that you need to perform some aggregation because you can not advertise anything smaller than /24 to your upstream. What I was trying to suggest is that there is more than one way to perform that aggregation.

1) use the aggregate address command. This produces a summary route and puts a null route into the routing table to protect against routing loops but it has the possibility to black hole traffic when the Metro E is not working.

2) use a static route for the summary. This produces a summary route and does not put a null route into the routing table. So it does not have potential to black hole traffic but does have potential for routing loops.

So there are pluses and minuses for each alternative. It is a tradeoff. You should choose which to use realizing the implications of the choice and selecting the one that you think is the best fit for your environment.

 

I am a bit puzzled at part of your response. You say that the addresses are not intermingled and each side has a unique set of addresses. But your post says that 1.1.1.0 is on one side and 1.1.2.0 is on the other side. To make things work you need to be able to advertise the individual /24. But you do not want BGP to advertise many /24 but to aggregate. That is why I suggest that a GRE tunnel would be a good solution to your issue. It would allow the individual sites to see details from the other site but would allow BGP to only advertise the summarized routes.

 

HTH

 

Rick

HTH

Rick

View solution in original post

16 REPLIES 16
Highlighted
Contributor

It appears your problem is the summary null 0 route in the rib. Now I am not sure if this is the best way to fix the problem, but you can remove the null route from both edge routers and that should fix the problem or you will need to come up with a summary address that doesn't overlap 1.1.1.0 with 1.1.2.0. Below is the config to remove null route:

router ospf xx

discard-route internal 255

Highlighted

cofee@0400,

You correct, I do have null routes in my rib...

sho ip route | incl Null 

B  1.1.1.0/21 [200/0] via 0.0.0.0, 7w0d, Null0

B. 1.1.8.0/21 [200/0] via 0.0.0.0, 7w0d, Null0

sho ip route 1.1.1.0 255.255.248.0

Routing entry for 1.1.1.0/21

  Known via "bgp 1234", distance 200, metric 0, type locally generated

  Routing Descriptor Blocks:

  * directly connected, via Null0

      Route metric is 0, traffic share count is 1

      AS Hops 0

However, when I look up a specific route, I get:

sho ip route 1.1.1.100

Routing entry for 1.1.1.0/24

  Known via "connected", distance 0, metric 0 (connected, via interface)

  Redistributing via ospf 1

  Advertised by bgp 1234

  Routing Descriptor Blocks:

  * directly connected, via GigabitEthernet1/3

      Route metric is 0, traffic share count is 1

 

Is this because of the following in my config?

 aggregate-address 1.1.1.0 255.255.248.0 summary-only

 aggregate-address 1.1.8.0 255.255.248.0 summary-only

JB

Highlighted

Below output is from network 1 or 2?

B  1.1.1.0/21 [200/0] via 0.0.0.0, 7w0d, Null0

B. 1.1.8.0/21 [200/0] via 0.0.0.0, 7w0d, Null0

- Can you also provide null 0 static route entries from both networks that are used to advertise in BGP?

I think what happens is when you lose the metro Ethernet connection you also lose specific routes that are attached to the other network and since you have null 0 for loop prevention, the traffic is sent there so it never makes to your default gateway.

Highlighted

coffee@0400,

Thanks again for your help.

I don't have any static Null routes in any of my routers. I think the Null routes are created by bgp in response to the aggregation... (see post above). The null routes become a black hole if my Metro ethernet connection fails, because the traffic does not go out the default gateway (as intended).

Any more info?

Thanks

JB

Highlighted

But you shouldn't be able to advertise a prefix in BGP if it's not installed in the global RIB. How the router is  learning that prefix?

Highlighted

The prefix is learned through OSPF and advertised because of a matching bgp network statement:

 network 1.1.4.0 mask 255.255.255.0

JB

Highlighted

Thanks. Removing summary only wouldn't stop injecting a null route, with summary only key word it suppresses advertising of specific prefixes from aggregated space. It's the aggregate address network command that causes null route injection.

Highlighted

For the sake of keeping my help request simple, I simplified my network layout quite a bid. The conceptional idea however is the same...

We actually have several /20 and /21 IP prefixes that we break out based on usage.

So, it's not uncommon to have one /20 broken out in anything from a /30 to a /24. Since I advertise (to my Internet provider) a minimum of a /24, I need to aggregate all my small prefixes.

Now that you have a good understanding of my problem, do you think your previous suggestion of using  "discard-route internal 255" is still valid?

Please let me know...

Thanks again

JB

Highlighted

Actually that was for ospf null route. I was under the impression that edge router was learning the summary address from ospf and then advertise in BGP.

What if you remove aggregate address from bgp and use network command to advertise those prefixes. For that you can either create a static route for the prefix that needs to be advertised or may be use ospf advertise a summary route, but not sure if that would be possible using ospf without knowing ospf topology.

Below is what you can do:

ip route 1.1.1.0 255.255.248.0 x.x.x.x (next hop address)

bgp xxx

network 1.1.1.0 mask 255.255.248.0

no aggregate-address 1.1.1.0 255.255.248.0

- or if you don't want to do that then may be you can create specific static routes pointing towards default route ,that you lose when metro Ethernet connection is down.  You can create those static routes with a higher administrative then iBGP so they will only be installed when you lose connection to another site through metro Ethernet.

let me know if that makes sense.

Highlighted
VIP Mentor

Hello

When you lose your IBGP/IGP link both sites are still advertsing the aggregate to either ISP and as such the ISP still sees the summary coming from its peered site and also you have a null route still in the rib blackholing becasue of it.

Why not just advertise all specifc networks to both ISP's and negate the summerisation, and make sure you dont be come a transit for either ISP by just advertising you local routes and nothing else.

res
Paul



kind regards
Paul

Please rate and mark posts accordingly if you have found any of the information provided useful.
It will hopefully assist others with similar issues in the future
Highlighted

Paul,

Thanks for your help...

I have removed "summary-only" from the aggregate-address command on both routers and (after clear ip bgp ..... soft) checked if I still have the null routes in my rib.

As you can see below, I continue to have the Null routes present within my rib.

FYI, I advertise the address space on both routers with different prefixes to have inbound routes reach the router, handling the specific IP space, directly...and to provide redundancy in case of an outage to one of my BGP peers.

Router 1:

aggregate-address 1.1.1.0 255.255.248.0

Router 1#sh ip bgp neighbors x.x.x.x advertised-routes 

BGP table version is 2282, local router ID is 10.255.0.1

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

              r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path

*> 1.1.1.0/24  0.0.0.0                  0         32768 i

*> 1.1.1.0/21  0.0.0.0                            32768 i

*>i1.1.2.0/23  1.1.2.1              0    100      0 i

*>i1.1.4.0/24  1.1.2.1              0    100      0 i

 

Router 1# sho ip route | inc Null

B       1.1.1.0/21 [200/0] via 0.0.0.0, 10:38:57, Null0

 

----------------------------------------------------------------------------

Router 2:

aggregate-address 1.1.2.0 255.255.254.0

aggregate-address 1.1.4.0 255.255.255.0

Router 2# sho ip bgp neighbors x.x.x.x advertised-routes 

BGP table version is 12625316, local router ID is 10.255.0.2

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

              r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path

*>i1.1.1.0/24  1.1.1.1              0     90      0 i

*>i1.1.1.0/21  1.1.1.1              0     90      0 i

*> 1.1.2.0/23  0.0.0.0                            32768 i

*> 1.1.4.0/24  0.0.0.0                            32768 i

 

Router 2# sho ip route | inc Null                              

B       1.1.4.0/24 [200/0] via 0.0.0.0, 7w0d, Null0

B       1.1.2.0/23 [200/0] via 0.0.0.0, 6w2d, Null0

These Null routes prevent traffic from using my default route when the Metro Ethernet (connecting router 1 and router 2) fails.

Any more tips?

Thanks

JB

Highlighted

JB

 

There are some things that we do not know about your network which might affect the suggestions that we would make, such as the distribution of networks/subnets between sites, whether the sites are using public IP addressing inside the site (does not use NAT) or uses private addressing inside and does NAT for Internet access. But based on what we know so far I believe that there are several aspects of your situation that we can discuss:

- the null route in the routing table. This is a normal behavior of aggregate address.If you think about it we can understand this from several perspectives:

A) It is a basic behavior of BGP that the protocol will not advertise a route to the neighbors that is not in the routing table. So when you use aggregate address to advertise a summary (regardless of whether you use summary only) BGP needs that summary address to be in the routing table and so aggregate address creates the null route.

B) The behavior of aggregate address is very similar to the behavior of summary address in EIGRP or the behavior of summarization in OSPF. When these routing protocols create a summary address they create a null route for it.

- There is an advantage of using aggregate address in that BGP will advertise the summary only if one or more included address blocks are in the active routing table. So if there are no constituent routes then the summary is not advertised. If you want this behavior then you need to use aggregate address (and have the null route in the routing table). If the null route is causing issues (as may be the case here)  then you need to look for an alternative, which would be to configure a static route for the summary. If you configure a static route for the summary then you can use the address of the next hop router as the next hop in the static route and it will not black hole traffic.

- Note that using a static route instead of aggregate address involves a trade off. You get the advantage of not black holing traffic but you lose the protection against potential routing loops. So consider the pluses and minuses and choose which technique you want to use.

- I believe that there is another issue when you advertise the /20 from site 1 and advertise two /21 from site 2. Remember that a basic principle of IP routing is that longest match wins. So site 1 will know that it should go to site 2 for an address that is not local to site 1 (site 1 has learned two /21 routes in addition to its /20). But consider what happens at site 2. When it needs to reach a destination that is not local it looks in the routing table, finding the /20 from site 1 but also finding the /21s. Longest match says that /21 wins and it would not send traffic to site 1.

- There appears to be another issue, if your example truly represents what is going on in your network. You show 1.1.1.0 in site 1 and 1.1.2.0 in site 2. If the subnets are truly intermingled like that then determining what is a destinatino within the site or what is remote becomes quite challenging. OSPF running over the Metro E solves this. But what to do is Metro E is down and OSPF is not communicating? I would suggest that the solutin would be to configure a GRE tunnel that runs between sites over the Internet. Use OSPF through the tunnel and manipulate the metric that that it is less attractive.This would allow you to route accurately between sites and also provides a solution to the question of null routes.

 

HTH

 

Rick

HTH

Rick
Highlighted

Paul,

Thanks for your response…
Here are some answers to some of the questions you’ve asked.

 

“There are some things that we do not know about your network which might affect the suggestions that we would make, such as the distribution of networks/subnets between sites, whether the sites are using public IP addressing inside the site (does not use NAT) or uses private addressing inside and does NAT for Internet access.”

 

The two routers in my scenario do not use NAT, nor do they use private IP addressing. All IPs are public IPs (owned by us) and address assignments do not intermingle between the two sides. For example, a /24 out of the /20 is only utilized on one side of the network. Although, the /24 might be broken down further into smaller blocks (i.e. /29s), they are still only utilized on one side of the network.


You are 100% correct, the Null route is the product of the BGP aggregation statements. The aggregates are needed because I can’t (obviously) not announce anything less then a /24 to my upstream provider. Since most of my IP space is broken up in anything from /30s to /24s, I will need to aggregate the space before I can announce it to by upstream provider.


“There appears to be another issue, if your example truly represents what is going on in your network. You show 1.1.1.0 in site 1 and 1.1.2.0 in site 2. If the subnets are truly intermingled like that then determining what is a destination within the site or what is remote becomes quite challenging. OSPF running over the Metro E solves this. But what to do is Metro E is down and OSPF is not communicating?”

 

…and this is exactly what is going on… OSPF is working great, making sure that both sides can communicate with each other… but once the the Metro Ethernet is down, all hell breaks loose.

I like the idea of the GRE tunnel to maintain the OSPF routing between both sides. It’s easy to configure and will hopefully address issue.

 

Thanks

JB

Highlighted

JB

 

Thanks for the additional information. I certainly understand that you need to perform some aggregation because you can not advertise anything smaller than /24 to your upstream. What I was trying to suggest is that there is more than one way to perform that aggregation.

1) use the aggregate address command. This produces a summary route and puts a null route into the routing table to protect against routing loops but it has the possibility to black hole traffic when the Metro E is not working.

2) use a static route for the summary. This produces a summary route and does not put a null route into the routing table. So it does not have potential to black hole traffic but does have potential for routing loops.

So there are pluses and minuses for each alternative. It is a tradeoff. You should choose which to use realizing the implications of the choice and selecting the one that you think is the best fit for your environment.

 

I am a bit puzzled at part of your response. You say that the addresses are not intermingled and each side has a unique set of addresses. But your post says that 1.1.1.0 is on one side and 1.1.2.0 is on the other side. To make things work you need to be able to advertise the individual /24. But you do not want BGP to advertise many /24 but to aggregate. That is why I suggest that a GRE tunnel would be a good solution to your issue. It would allow the individual sites to see details from the other site but would allow BGP to only advertise the summarized routes.

 

HTH

 

Rick

HTH

Rick

View solution in original post