cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
21360
Views
20
Helpful
21
Replies

BGP failover times were VERY quick-Why?!

John Blakley
VIP Alumni
VIP Alumni

All,

I thought that the default timers for bgp were 60 and holddown was 180. I may be wrong, but shouldn't a route that falls out of the table be at most put back into the table after 180 seconds (3 min.)?

We tested our failover this weekend, and we shut down our main router to watch our block roll over to our backup router. We lost two packets. I peer with the provider using the same AS on my end (both of my routers are using bgp 1 for instance, and I peer with bgp 2). I'm wondering if this is the reason the failover happened so quickly?

Thanks,

John

HTH, John *** Please rate all useful posts ***
2 Accepted Solutions

Accepted Solutions

John,

This behaviour is due to "bgp fast-external-fallover

" enabled by default. This command, suppresseds the timers.

Negate it and retest.

Sam

PS: Good post !!

View solution in original post

Hello John,

fast-external-fallover tracks the state of the outgoing interface towards the eBGP peer.

If that interface is detected down the session is torned down too without having to wait for hold timer to expire

Hope to help

Giuseppe

View solution in original post

21 Replies 21

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello John,

>> but shouldn't a route that falls out of the table be at most put back into the table after 180 seconds (3 min.)?

This is BGP not RIP, there is no holddown timer here.

Hope to help

Giuseppe

Giuseppe,

Can you explain what the timers option is for on the neighbor statement? Also, when I do a "sh ip bgp neighbor x.x.x.x" I get holddown timers:

BGP neighbor is 15.15.15.2, remote AS 12, external link

BGP version 4, remote router ID 209.30.236.1

BGP state = Established, up for 00:01:08

Last read 00:00:08, hold time is 180, keepalive interval is 60 seconds

Neighbor capabilities:

Route refresh: advertised and received(old & new)

Address family IPv4 Unicast: advertised and received

What is it used for, and is there another way that we can keep it from failing over so quickly?

Thanks,

John

HTH, John *** Please rate all useful posts ***

John,

since you shut down the router the BGP peer immediately broke causing the convergence. The 60 second keepalives and 180 hold time were not involved in this process since your interface went down. To test those timers you will need to maintain the interface up but not allow the keepalives to get to the peer router. You can use IP event dampening to prevent flapping interfaces from causing multiple convergences, but not aware of any user-defined parameters that will delay the convergence from happening.

since you shut down the router the BGP peer immediately broke causing the convergence.

How does the neighboring router know the interface went down without using the keepalives?

Thanks,

John

HTH, John *** Please rate all useful posts ***

Interface goes down => routes via this interfaces are withdrawn => BGP session is teared down.

On the other hand, an ACL blocking TCP 179, would take a lot longer to be detected, about 3min as you expected.

HTH

Sam

Sam,

I guess my main question is why my route failed over so quickly. If the interface goes down, how can I control the convergence time or is this impossible?

I'm really not grasping the concept of having hold timers, but they're only queried if there's an access-list blocking the port. I would think that if the peer missed a hello packet, be it blocked or a down peer, the neighboring router should still send two more hellos before it flips to the other route, meaning 3 minutes by default.

Thanks Sam,

John

HTH, John *** Please rate all useful posts ***

Hello John,

Sam has explained the probable reason for what you see.

Have you configured ebgp fast external fallover or its successor neigh x.x.x.x fall-over ?

Usually people complain of the slowness of failover when it relies on default timers.

You can see the timers as used to detect indirect failures like provider's staff putting in shut the session.

Reaction to link failure takes the time of interface link failure detection that depends on the technology in use:

for example if the link is a direct serial link and the provider router is also the DCE at OSI layer1 after shutting down the interface the other side goes down/down.

Another example is POS that be as less as 50 msecs.

Hope to help

Giuseppe

John,

This behaviour is due to "bgp fast-external-fallover

" enabled by default. This command, suppresseds the timers.

Negate it and retest.

Sam

PS: Good post !!

Sam,

AH! Now, if I'm peering with my ISP and I try to negate it on my end, can it be done on one of spoke routers or does it have to be done on the multihomed router?

Thank you for the compliment on "good post." =)

John

HTH, John *** Please rate all useful posts ***

removing it from your end should be enough to see session taking longer to tear down (never seen this being a requirement...but I can see why one would want that :-)

The other things about timers, is that they are negotiated and lowest wins. so if you peering router is using default you can only benefit from longer hold time if you agree with peers to match urs or exceed them.

I am not entirely sure, but I recall seeing a new feature which stops this. it is used as a security feature to protect attackers to meltdown your CPU by reducing timers and therefore increasing BGP scans.

Sam

I found the command on Cisco's site, so now I have to ask:

How does the fast-external-fallover know that the peer went down if it's not using hello packets? Does it just see the route fall from the table, perform some kind of soft reconfig, and then fallover to the other peer?

Thanks,

John

HTH, John *** Please rate all useful posts ***

Hello John,

fast-external-fallover tracks the state of the outgoing interface towards the eBGP peer.

If that interface is detected down the session is torned down too without having to wait for hold timer to expire

Hope to help

Giuseppe

Many thanks for the rating !

It's link status related,

Usage Guidelines

The bgp fast-external-fallover command is used to disable or enable fast external fallover for BGP peering sessions with directly connected external peers. The session is immediately reset if link goes down. Only directly connected peering sessions are supported.

If BGP fast external fallover is disabled, the BGP routing process will wait until the default hold timer expires (3 keepalives) to reset the peering session. BGP fast external fallover can also be configured on a per-interface basis using the ip bgp fast-external-fallover interface configuration command.

http://tools.cisco.com/Support/CLILookup/cltSearchAction.do?Application_ID=CLT&IndexId=IOS&IndexOptionId=123&SearchPhrase=%22bgp%20fast-external-fallover%22&Paging=25&ActionType=getCommandList&Bookmark=True

Sam

So, if I wanted for the peer to wait for four hours before rolling my block over, I would need to disable fast-external-failover, and then set my timers to 4800 14400 and have the provider do the same? Or should I leave my default keepalives at 60, and then set my holdtime for 14400?

;-)

HTH, John *** Please rate all useful posts ***
Review Cisco Networking for a $25 gift card