I am setting up BGP peering between Cisco and another firewall provider. The firewalls are in active/standby and are supposed to do failover smoothly without dropping much traffic with the help of the graceful-restart feature. Right now it is not that smooth... I will be troubleshooting with the firewall vendor but I want to understand better about the technology before that.
One thing that confuses me is the timer. I have the regular hello and hold timer set pretty aggressively to 3 and 15 seconds, respectively. Does it affect the graceful restart capability? Let's say that the firewall takes about 30 seconds to finish the failover and that would be enough to kill the BGP peering. However, there is a "graceful restart-time" that will keep the stale routes. Does the router keeps forwarding the traffic despite the broken BGP peering?
It looks like that the firewall will reestablish the BGP peering anyway after the failover. How does it work in the Cisco world when you have two SUP cards, when one fails, does another SUP re-establish the BGP anyway without dropping the traffic?
I have the regular hello and hold timer set pretty aggressively to 3 and 15 seconds, respectively. Does it affect the graceful restart capability?
Here, It is looking that your Hello and Hold timer are less than firewall switchover timing so here, Switch will make peer down and flush all routers from the neighbor before failover will send a keepalive message. As per my understanding, it must be changed and I don't think your firewall is taking 30 seconds in the HA failover. Normally, it would be 2 to 5 seconds but it depends on the vendor.
What is BGP Graceful Restart timer? Restart Timer determines how long peer routers will wait to delete stale routes before a BGP open message is received. This timer should be less than the BGP Holdtime.
I have gone through the best article where author is guided for setting BGP hello and Hold timer. https://www.noction.com/blog/bgp-timers
"By using timers 10 32 rather than timers 10 30, the other side will use 32 / 3 = 10 for the keepalive interval, so we should expect keepalives after 10, 20 and 30 seconds, and after 32 seconds we should have seen all three. With timers 10 30, on the other hand, we could be tearing down the BGP session after 30 seconds just as the neighboring router is sending that third KEEPALIVE message, hence the additional two seconds."
Thanks Deepak. The thing that confuses me is when it says "before the open message" is received. Does it imply that the peering was already torn down? It would not expect to receive an open message if the peering is still up especially with the long hold timer. Plus, how does the router know when the open message will be received to calculate when to delete before it?? Is it not completely up to the peer? This is very confusing.