IOS XR OSPF subsecond timer vs BFD

budilasmono · ‎01-28-2013

Dear all,

We recently got problem with subsecond OSPF timer on CRS.

On previously, we can configure OSPF hello timer min dead-interval 3 on 7606 router, which mean OSPF hello is at 333 ms.

Unfortunatelly, CRS cannot be configured with 333 ms for hello timer.

And theres some of our TAC case with result that we need configure BFD.

If we configure BFD on top of OSPF, is it recommended ?

Is we got the same behaviour like having 333ms on OSPF, as expected when we configure BFD on top of OSPF at CRS ?

And our network has different router like ASR9000 series, 7600 series router, ME6524 series. Is all has the same behaviour when we config the same BFD configuration and parameter ?

Is it configuring the BFD on routing protocol is recomended ? replacing the sub-second timer configuration ?

Thanks,

Budi L

mdebraba · ‎01-28-2013

Yes, if you want OSPF to go down as soon as a lack of connectivity is detected on the line, subsecond, BFD is the feature you need.

OSPF will be a client of BFD, so when BFD detects a connectivity failure it will immediately notify OSPF. The protocol itself is compatible with ASR9k and IOS devices, the only difference may be the available values.

See:

http://www.cisco.com/en/US/docs/routers/crs/software/crs_r4.1/interfaces/configuration/guide/hc41bifw.html

for configuration guidelines on IOS-XR.

slopes · ‎07-15-2015

Has anyone experience BFD with OSPF using IOS on 1921 and ASR1002 ,our latency from one end to other is around 300ms round trip ,even we set bfd parameter to be 300ms multiplier 3 it automatically brings the GRE OSPF Tunnel down without any high load

ospf throttle parameters are 1000 5000 5000 for our case

if you come across or top of mind please let us know.

Thanks

Sanjeev

xthuijs · ‎07-16-2015

300 msec is rather long RTT, if you are using bfd echo this is very tight, so best to add some room/buffer on the 300msec timer of bfd

OR use async, that way you are only going one way and saves some of the timing constraints you have.

xander

slopes · ‎07-16-2015

we did some changes to make it 800ms ,it works fine so far in terms of recovery and bring the bad link fast down in 2sec and stability with BFD timers ,hope it does works as we do more testing and we can achieve our goal to get client satisfactions for maintaining their no lost on transactions

Thanks

amaged · ‎01-28-2013

Hi,

In a general sense, For a network with very low convergence requirements (sub second), choosing the 'right' methods to achieve the optimum network convergence is a long excercise that highly depends on network design and Hardware/Software features. So as a start, make sure you know what your current software and HW can and can not support. Then look at what will be the outcome of combining different fast convergence and High availability features together.

So you have 3x blocks of different ways to enhance fast convergence and high availability:

1- LDP-IGP sync: be careful with LDP-IGP sync as XR has a different behavior than IOS (IGP adj is always build so there is no concept of holddown timer).

2- OSPF process tuning : LSA and SPF throttle tuning

3- Link failure detection tuning : BFD

I would start with LDP-IGP sync and tuning the link failure detection can be done after or before the OSPF protocol tuning.

Example: Making sure that Link failure detection mechanism is compatible with your HA configuration. For example BFD is not handled in Hw on 7600 so you can't use it if you also want SSO/NSF on this box as well. The neighbor will switch to a backup path before OSPF GR has a chance to start.

You have to plan for more 2 different kind of convergence events here, a down convergence event (BFD and protocol timers detect this), and an up convergence event (not timer based, service establish based on what facilities are available). The state between the start of the up convergence and fully converged is a transient state where multiple things start to happen depending on the facilities available (first interface up, then IGP, maybe PIM, maybe BGP, etc). At scale, this state is often very non-deterministic, rarely happens in the same order twice, and requires some amount of system analysis to make sure there are enough resources available to re-establish all the different services without the risk of flapping anything.

The more processes you add to the router (OSPF, BGP, PIM, and now BFD) the more critical it is to pay attention to how much time and resources each process requires. Reducing protocol timers can exaggerate the conditions in the transient state, so there are tradeoffs for having low convergence vs. scale/performance. For instance if BGP is not fully reconverged and a neighbor flaps because the CPU is busy, the whole process may have to start all over again and there is potential that the router may never fully recover if that scenario repeats.

BFD and IGP timers are mutually exclusive, It is always recommended to turn BFD on with IGP. Be sure not to tune OSPF too aggressive. Also to keep in mind that BFD is not SSO aware (Think IOS vs XR), so when there is a SUP fail over, BFD will observe microflaps hence you will see some packet loss. That's why you have to tune the values to an optimum level e.g. for a MAN 200-300ms interval for BFD control packets should be fine but you have to test it.

Thanks,

Ahmed