BNG Geo-Redundancy - NO sub-second failover after power failure

Sebastiaan1978 · ‎08-09-2018

Hi,

We are testing a Geo Redundacy setup with IPoE subscribers. Router A and router B are used, one as master other as slave in hot standby mode. If I shutdown the access-interface Internet and IPTV service switchover sub-second (no glitches or lost ping packets), that looks great. But when we power out router A to simulate a power outage, it takes about 2-3 minutes before router B takes over the subscriber services. This looks like some kind of (hold)timer but we don't understand what's happening. The peer (=loopback) address in the subsriber redundancy config is advertised by ISIS

subscriber
redundancy
source-interface Loopback0
group 1
   preferred-role slave
   virtual-mac 0002.0000.0003
   slave-mode hot
   peer 1.2.3.4
   peer route-disable
   access-tracking int-BE4
   state-control-route ipv4 x/24 vrf default tag 30
   revertive-timer 2 maximum 5
   interface-list
    interface Bundle-Ether4.302 id 302

Aleksandar Vidakovic · ‎08-20-2018

How is the state-control-route protected in the core? Do you use BFD to withdraw it quickly in case of power outage on node A?

Sebastiaan1978 · ‎08-28-2018

Hi Aleksander,

How can I use BFD to protect the state control route? We use BFD on the core-uplink. The loopback0 (which is used for the SRG peer session) is added in the IS-IS config as a passive interface, so even without BFD I should expect the other router to notice that this router is down after a second or so.

When I just shut the client-facing interface the switchover is subsecond, but only when I reboot it takes 2-3 minutes.

Regards,

Sebastiaan

Aleksandar Vidakovic · ‎08-29-2018

hi Sebastiaan,

reboot is a special case. Reboot is usually a planned activity, typically preceded with a routing of all traffic away from the node. In this case it seems the BFD session went into admin down (vs peer loss due to BFD timeout), which probably caused the BGP peer (I presume that based on 180s timeout) to maintain the BGP session up until the TCP timed out.

/Aleksandar