Solved: ASR9K Multi-hop BFD resiliency

James Jun · ‎03-05-2021

When doing multi-hop async BFD sessions on ASR9K, the following restrictions / caveats are noted:

1. HW-offload is not supported

2. LC CPU which hosts the session is randomized according to line cards listed under 'multipath include location X/X/CPU0' under bfd configuration block. There is no way to specify an individual BFD session to be hosted on a particular LC.

So the concern here is, if the line card which is hosting the BFD session dies (due to HW failure, user commanded reboot, or OIR etc), this could destroy the BFD session for iBGP sessions which are transported by surviving line cards for the underlying physical path.

In the event of HW failure on the LC which is hosting the multi-hop BFD session, I presume that software will immediately and automatically try to re-assign the BFD session to the next line card listed under 'multipath include' list. Is this correct? If so, if we use reasonably spaced timers (e.g. 500ms interval, multiplier 5 for total of 2.5 seconds detection time), would it allow the affected BFD session to become re-hosted in time to surviving line cards before tripping the session?

Basically I'm trying to setup BFD for iBGP sessions to route-reflectors on a router. The router has two backbone facing line cards, and I want to make sure that both route-reflector iBGP sessions don't go down simultaneously, when a line card hosting the BFD multipath sessions goes down for whatever reason (hw failure, etc).

smilstea · ‎03-06-2021

So it really depends on the type of failure. So if the LC CPU gets hung for any reason for an extended period of time or is unable to process the bfd packets then the sessions would go down before the LC can be restarted but then the sessions would migrate to another LC. In most cases when there is an immediate hardware fault or the user reloads the LC then the sessions migrate instantly.

Sam

View solution in original post

James Jun · ‎03-06-2021

Thanks for the note, this helps.

So it seems on multi-line card (chassis based) platforms, multi-hop BFD facing iBGP RRs is more hassle and complications than it is worth (solving one outage risk scenario by creating another outage risk). FWIW, below is the conclusion I've logged into our case internally regarding this issue:

For non-chassis, single line card systems (ASR 9901, ASR 9902): use BFD on iBGP RR sessions, as there is only one LC CPU for the whole shelf, and if the sole LC CPU hangs for whatever reason, it's time to reconverge out of this router ASAP.

For chassis-based multi line card systems (ASR 9903-9922): do not use BFD on iBGP RR sessions. Ensure that chassis is equipped with redundant commons (RSPs, fab cards, two backbone facing LCs, redundant PSU config, etc) to prevent accidental loss of node. When rebooting the entire chassis (for SW upgrade, maintenance, etc), amend the MOP checklist to ensure that traffic is fully drained out of the router prior to reload to prevent dual-homed customers blackholing upon reboot (use BGP Graceful Shutdown & route policies to shift traffic away from the router, and then shut down all eBGP sessions after traffic is drained)

View solution in original post

MHM Cisco World · ‎03-06-2021

sorry again please

you have iBGP and it neighbor to two RR,

each one use specific LC?

why you need BFD please, I don't get it?

James Jun · ‎03-06-2021

iBGP RR sessions are multi-hop by nature. If the router dies for whatever reason (due to accidental power loss, software upgrade, or reload rack 0, etc), unless session is culled manually, it will stay up on RR side until hold-down timer has expired. If you have customers dual-homed into your network and the primary edge router happened to have rebooted or powered off, you will be blackholing those customers until hold-down has expired; after which, traffic will then flip over to secondary router.

MHM Cisco World · ‎03-06-2021

OK,
your costumer have dual Edge router,
CE1
CE2,
CE1 connect to PE1
CE2 connect to PE2

both PE1 and PE2 will neighbor to RR1

we simply will make ibgp between PE1 and PE2, when the PE1 detect the CE1 failure it will forward traffic to PE2, PE2 will forward packet to CE2.
now this prevent the blackhole until the PE1 inform the RR1 that this link is not more available.
check PIC edge feature for BGP.

smilstea · ‎03-06-2021

So it really depends on the type of failure. So if the LC CPU gets hung for any reason for an extended period of time or is unable to process the bfd packets then the sessions would go down before the LC can be restarted but then the sessions would migrate to another LC. In most cases when there is an immediate hardware fault or the user reloads the LC then the sessions migrate instantly.

Sam

James Jun · ‎03-06-2021

Thanks for the note, this helps.

So it seems on multi-line card (chassis based) platforms, multi-hop BFD facing iBGP RRs is more hassle and complications than it is worth (solving one outage risk scenario by creating another outage risk). FWIW, below is the conclusion I've logged into our case internally regarding this issue:

For non-chassis, single line card systems (ASR 9901, ASR 9902): use BFD on iBGP RR sessions, as there is only one LC CPU for the whole shelf, and if the sole LC CPU hangs for whatever reason, it's time to reconverge out of this router ASAP.

For chassis-based multi line card systems (ASR 9903-9922): do not use BFD on iBGP RR sessions. Ensure that chassis is equipped with redundant commons (RSPs, fab cards, two backbone facing LCs, redundant PSU config, etc) to prevent accidental loss of node. When rebooting the entire chassis (for SW upgrade, maintenance, etc), amend the MOP checklist to ensure that traffic is fully drained out of the router prior to reload to prevent dual-homed customers blackholing upon reboot (use BGP Graceful Shutdown & route policies to shift traffic away from the router, and then shut down all eBGP sessions after traffic is drained)