Cisco NSO HA Issue

bjronmork · ‎01-02-2020

Hi Experts,

I have deployed Cisco HA Advanced, using Single VIP.

1st Failover to Slave server is automatic. And Revert back to regional states is manual procedure.

I need to know;

(1) Can we make revert back procedure automatic? Once Master server is restored, it should know automatically; sync database by making itself slave and sync from current Master, once DB is synced, it take over as master....

(2) For example if we have some power outage on both server at the same time; how would we know which server has active database? Can NSO automatically recover back to its original state before it was ?

Regards,

Bjron

lmanor · ‎01-02-2020

What versions of NSO and tailf-hcc are you using?

bjronmork · ‎01-06-2020

Using below versions;

NSO 4.7.3.1

HCC 4.4

lmanor · ‎01-06-2020

Hello,

In tailf-hcc (all versions ti date), the ability to 'auto' failback to the original master-slave config following a master->slave failover is not possible. This was done to ensure that the user manually determines which CBD, the slave or master following failover, is the most recent prior to failing back.

With tailf-hcc, there are situations in which the Slave node will claim mastership (and create a VIP interface), but the original Master node will still think that it is master (with a VIP interface) - yes a split brain situation. The slave has claimed mastership since its 'pings' to the master fail because it can no longer connect/communicate with the master. In cases where the Master server dies or the NSO server crashes, this is great and slave is the sole master. However, in cases where only the network failure between the nodes is blocking the 'pings' (the master server and NSO are still operational) the slave claims master and cannot communicate with the master to inform it of the failover - thus both are masters.

In this case if your northbound client is issuing NSO requests to the VIP it's likely it will continue to update the original master - or in some northbound/network designs it may connect to the VIP on the failed over Slave.

It needs to be determined which CDB is being updated after the failover event - in todays tailf-hcc this needs to be done manually.

Enhancement requests have been made for 'auto-failback'. Currently, NSO HA is getting a rework to provide simpler configuration including the ability for 'auto-failback' targeted for NSO 5.4 I believe.