BGP session shutdown for fast convergence

sebastien3 · ‎11-27-2021

Hello,

I am trying to set up a mechanism that allows me to drop a BGP session quickly in order to avoid a black hole on the detection time of BGP down (timers).

I am using IP SLA to test if my ISP router is still UP. The problem is that if the traffic is high, the ping may be lost and my BGP session will fall through the fall-over...

ip sla 1  
 icmp-echo 1.2.3.4 source-interface GigabitEthernet0/0
 frequency 5
ip sla schedule 1 life forever start-time now
!
track 1 ip sla 1 reachability
!
!
router bgp XXXX
 neighbor 1.2.3.4 remote-as 6939
 neighbor 1.2.3.4 description Connected via GigabitEthernet0/0
 neighbor 1.2.3.4 fall-over route-map BGP-TRACK
!        
!
ip route 1.2.3.4 255.255.255.255 GigabitEthernet0/0 track 1
!
! 
ip prefix-list BGP-TRACK seq 5 permit 1.2.3.4/32
!
! 
route-map BGP-TRACK permit 10
 match ip address prefix-list BGP-TRACK
!

Is my configuration correct ?

What do you recommend as a configuration when BFD is not possible ?

Thanks

Georg Pauwen · ‎11-27-2021

Hello,

you could add a delay to your track, that way, the failover would not occur right away:

track 1 ip sla 1 reachability

--> delay up 10 down 10

Or, rather than ICMP, you could use UDP echos:

ip sla 1
udp-echo 1.2.3.4 3456
threshold 10
timeout 100
frequency 3
ip sla schedule 1 life forever start-time now
ip sla responder

Another possibility is to implement a QoS policy that prioritizes the ICMP traffic between the two hosts.

paul driver · ‎11-27-2021

Hello
BGP isn’t really designed to failover fast due to it being a external routing protocol and the given the amount of prefixes it can carry

However saying that it does have its own fast failover features that can be useful and also a generic failover for all routing process called BFD and PIC with the latter I would say most elegant of the two depending on your topology setup.

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

sebastien3 · ‎11-27-2021

@Georg Pauwen:

Yes i can try udp-echo but i can't really figure out which of the two is better in this situation between udp and icmp echo...

>Another possibility is to implement a QoS policy that prioritizes the ICMP traffic between the two hosts.

An idea to exploit. But I don't know how to do this function...

@paul driver:

I have two ISP, if one is faulty I must be able to cut the BGP session so as not to end up with a black hole.

This is why IP SLA allows me to do a reachability test. When the ping is OK, I leave a timer of one minute so that the full view can be completed on the router and then I send the 0.0.0.0 to the other routers !

paul driver · ‎11-27-2021

Hello
Are you load sharing?
TBH if you are aware one of your isp connections are faulty then wouldn't it be better to not to use that connection until it is fixed and relocate all your egress/ingress traffic over the good ISP connection?

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

Georg Pauwen · ‎11-27-2021

Hello,

UDP is a bit more reliable, however, prioritizing ICMP traffic seems a better option. What model are your routers, which IOS versions are they running ? Post the configs of both routers...

sebastien3 · ‎11-27-2021

I have two ISP :

ISP1 connected to the R1

ISP2 connected to the R2

R1 and R2 are in full-mesh.

paul driver · ‎11-27-2021

Hello

If BFD and PIC isnt applicable then fall-back with ipsla tracking would be an alternative and your configuration looks fine

Edited-
you may just need to make sure your tracked host isn’t reachable from isp2 when isp1 is unavailable otherwise failover may not return as /when isp1 becomes available again;

Ip local policy route-map ipsla

access-list 100 permit icmp host (source-ip) (isp tracked ip) exho

route-mapi ipsla
match ip address 100
set ip next-hop (isp1)
set interface null0

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

MHM Cisco World · ‎11-27-2021

follow

pman · ‎11-28-2021

Hi,

If your ISP does not support BFD and matching BGP Timer between your ISP and your routers does not suit you then the logic you attached seems to be correct.
Here's what I set up (for your reference, I have attached the response times):

R1(Gi1)<------>R2(Gi2)

track 1 ip sla 1 reachability

!

router bgp 1
bgp log-neighbor-changes
neighbor 1.2.1.2 remote-as 2
neighbor 1.2.1.2 description Connected via gi1
neighbor 1.2.1.2 fall-over route-map BGP-TRACK

!

ip prefix-list BGP-TRACK seq 5 permit 1.2.1.2/32
ip sla 1
icmp-echo 1.2.1.2 source-interface GigabitEthernet1
threshold 10
timeout 1000
frequency 3
ip sla schedule 1 start-time now

!

ip route 1.2.1.2 255.255.255.255 GigabitEthernet1

BGP peer is down

*Aug 30 19:43:13.783: %TRACK-6-STATE: 1 ip sla 1 reachability Up -> Down
*Aug 30 19:43:13.785: %BGP-5-NBR_RESET: Neighbor 1.2.1.2 reset (Route to peer lost)
*Aug 30 19:43:13.785: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:1) Reset (Route to peer lost).
*Aug 30 19:43:13.785: BGP: nbr_topo global 1.2.1.2 IPv4 Unicast:base (0x7FCF8E3EC2D0:1) NSF delete stale NSF not active
*Aug 30 19:43:13.785: BGP: nbr_topo global 1.2.1.2 IPv4 Unicast:base (0x7FCF8E3EC2D0:1) NSF no stale paths state is NSF not active
*Aug 30 19:43:13.785: BGP: nbr_topo global 1.2.1.2 IPv4 Unicast:base (0x7FCF8E3EC2D0:1) Resetting ALL counters.
*Aug 30 19:43:13.786: BGP: 1.2.1.2 closing
*Aug 30 19:43:13.786: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:1) Session close and reset neighbor 1.2.1.2 topostate
*Aug 30 19:43:13.786: BGP: nbr_topo global 1.2.1.2 IPv4 Unicast:base (0x7FCF8E3EC2D0:1) Resetting ALL counters.
*Aug 30 19:43:13.786: BGP: 1.2.1.2 went from Established to Idle
*Aug 30 19:43:13.787: %BGP-5-ADJCHANGE: neighbor 1.2.1.2 Down Route to peer lost
*Aug 30 19:43:13.787: %BGP_SESSION-5-ADJCHANGE: neighbor 1.2.1.2 IPv4 Unicast topology base removed from session Route to peer lost
*Aug 30 19:43:13.787: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:1) Removed topology IPv4 Unicast:base
*Aug 30 19:43:13.787: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:1) Removed last topology
*Aug 30 19:43:13.787: BGP: nbr global 1.2.1.2 Active open failed - route to peer is invalid
*Aug 30 19:43:13.787: BGP: nbr global 1.2.1.2 Active open failed - route to peer is invalid

BGP peer is up
*Aug 30 19:43:33.785: %TRACK-6-STATE: 1 ip sla 1 reachability Down -> Up
*Aug 30 19:43:33.787: BGP: nbr global 1.2.1.2 Open active delayed 1024ms (0ms max, 60% jitter)
*Aug 30 19:43:34.382: BGP: 1.2.1.2 active went from Idle to Active
*Aug 30 19:43:34.383: BGP: 1.2.1.2 open active, local address 1.2.1.1
*Aug 30 19:43:34.386: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:0) act Adding topology IPv4 Unicast:base
*Aug 30 19:43:34.386: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:0) act Send OPEN
*Aug 30 19:43:34.387: BGP: ses global 1.2.1.2 (0x7FCF8E3EC2D0:0) act Building Enhanced Refresh capability
*
*
*
*Aug 30 19:43:41.563: BGP: ses global 1.2.1.2 (0x7FCF3507DCB8:1) Up
*Aug 30 19:43:41.563: %BGP-5-ADJCHANGE: neighbor 1.2.1.2 Up

sebastien3 · ‎11-28-2021

@Georg Pauwen: I am using ASR 1000 with IOS adventerprisek9.03.16.10.S.155-3.S10-ext.b

@paul driver: Yes no support of BFD, for me I only have the fall-back that can help me...

@pman: Yes that's right ! The fall-over with IP SLA works my problem is that when a ping will not pass the BGP session will down while the peer is not dead !

My question is how to tuning the icmp of IP SLA to avoid a false positive...

paul driver · ‎11-28-2021

Hello

I have edited my last post-

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul

sebastien3 · ‎11-29-2021

@paul driver: What is exho on access-list 100 ?

paul driver · ‎11-29-2021

Hello
Typo it should read echo

Please rate and mark as an accepted solution if you have found any of the information provided useful.
This then could assist others on these forums to find a valuable answer and broadens the community’s global network.

Kind Regards
Paul