Solved: Backup on HA configuration.

Alejandro Madurga Ainoza · ‎01-30-2019

Hi all,

I'm looking for best practices regardin backup and restore when the HA is activated.

My main concern is if the backups are interchangeable or not, if there is a backup done on the Master node, can be used to recover a slave node? Are the backups independant in terms of HA roles?

The main issue is to have a clear picture on how to manage backups in case of failover during an extended period of time (more than one day).

Thanks in advance,

Alex.

vleijon · ‎02-01-2019

The database depends only on the schema. So as long as both nodes have the exact same packages loaded (which they should!) then the database file is compatible.

In fact, you can use this for lab purposes, you can copy a database from a running nso into your lab to perform experiments (upgrade, new features,…) on your real data.

View solution in original post

joepak · ‎01-30-2019

Hi,

From what I understand.. Master <> Slave = same CDB/packages/etc. They should completely mirror each other (maybe except the underlying server/os).

I don't necessarily know what you mean if a backup is done on master, that it can recover a slave node? If the Master is down, the Slave will become the Master and it will have the same CDB as the master after the fail-over. Since Master/Slave are generally mirroring each other, I wouldn't believe it'd be necessary for a backup/restore for the Slave (but now master) to utilize a restore.

Let me know if you need more assistance. To me, it's not generally clear on what you are asking since i'd expect Slave to be the new master as the previous master failed. When fail-over occurs, Masters CDB is in sync with the slave, so slave would currently have everything that the master has.

Alejandro Madurga Ainoza · ‎02-01-2019

Hi, thanks for the answer!

The use case, is as follows:

HA environment with a Master and Slave
For now, only backups are scheduled on the Master node via cron (once a day),
In case of Master failure, then the Slave takes the master role.
In case of extended period of time of failover, there is need of create backups on the the new Master.

We are thinking to take backups on both nodes (Master and Slave) periodically, but we don't know what are the limitations/dependencies of those backups, ie, in case of total disaster, the recovery can be done using any of the backups (Master or Slave) or the backups are role/vm dependant.

Thanks in advance,

Alex.

vleijon · ‎02-01-2019

The database depends only on the schema. So as long as both nodes have the exact same packages loaded (which they should!) then the database file is compatible.

In fact, you can use this for lab purposes, you can copy a database from a running nso into your lab to perform experiments (upgrade, new features,…) on your real data.

Alejandro Madurga Ainoza · ‎02-01-2019

Thanks a lot for the answers! now it is crystal clear.

lmanor · ‎02-01-2019

Alejandro,

Sorry, late to this thread.

A couple points on NSO HA and tailf-hcc package operation some alluded to in the responses here, but just to recap:

Out of the box NSO HA is solely the replication of the CDB from Master->Slave. The entire CDB is replicated when first connected (which may take a while depending on CDB contents) and updated on each subsequent master CDB transaction. Transaction-id on each CDB will be identical (show ncs-state internal cdb datastore transaction-id)
Master node CDB is read/write.. Slave node CDB is read-only.
In order for NSO HA to connect master-slave, NSO versions and ALL packages _must_ be identical between master and slave - essentially the CDB schema in both nodes must be identical.
Tailf-hcc is an application that helps configure and monitor the HA provided by NSO, and provides a single automated Master-slave failover. Revert to original Master/slave must be done manually. This is intentional to ensure that correct (most up-to-date CDB) is maintained.
Master-Slave connection is monitored by the Slave node.
The Slave node will initiate a failover and claim the master role when communication to the master is lost. This loss may be because the Master NSO or vm/server failed, or communication link between has failed. In the latter case, a dual-headed mastership may occur; the Slave will claim mastership (since it can no longer talk to the master), while the Master will think that it is still master. The NB management will need to deal with this.
Many customers maintain a 2nd Slave instance as a Disaster Recover node - usually geographically diverse - for an additional copy of the CDB. Slave updates are made in parallel.
The NSO HA API is available northbound if customers would like to craft more elaborate Master/slave failover mechanisms.

-Larry

joepak · ‎02-01-2019

I don’t think taking backups of both would be efficient since you’re essentially backing up the same NSO instance (think of HA as mirrored NSO server/cdb). Essentially if HA is operational, backing up the master is enough (unless master is down, then initiate backup of the new master). Just to save memory since daily backups and consume a lot of memory over time unless you plan on removing them over time.

The most recent reply from vleijon seems to be the answer you were looking for. So a backup/restore shouldn’t be affected by whether it is from master or slave(due to identical schema).

Fear of automation causing outages or misconfigurations
AI decisions being a “black box”
Data quality and completeness issues
Lack of trust in AI recommendations
Skills or knowledge gaps among staff
Other
Security, compliance, or privacy concerns
Organizational resistance to change