Solved: Re: NSO High Availability Slave on separate servers

eric.n.dunn.ctr · ‎08-23-2017

I've successfully tested the $NCS_DIR/examples.ncs/web-server-farm/ha (High-Availability) on one server.

master=n1

slave=n2

I cannot find any documentation that shows how to configure the slave to run on another server.

The error is: "cannot bind to internal socket".

I edited the (other server)ncs.conf <ncs-ipc-address> to the IP where the master runs.

I opened firewall for ports 5757 and 5758.

Has anyone had success in configuring HA on two separate servers?

lmanor · ‎09-06-2017

Eric,

Actually, quite to the contrary, NSO HA and the tailf-hcc package have been widely deployed and operational in many customer networks.

I just quickly did a test that confirms that tailf-hcc version 4.2.0 works fine with NSO 4.4.2.

[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd packages/

[lmanor@CentOS64-1 packages]$ ls

ncs-4.4.2-tailf-hcc-4.2.0.tar.gz

[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd ..

[lmanor@CentOS64-1 ncs-ha-4.4.2]$ tail -n 20 ncs.conf

<.. snip ..>

<ha>

</ha>

<.. snip ..>

</ncs-config>

Master NCS:

admin@ncs> show packages

packages package tailf-hcc

package-version 4.2.0

description "Package for Tail-f HA Cluster Control Interface"

ncs-min-version [ 4.1.3 4.2 ]

directory ./state/packages-in-use/1/tailf-hcc

component TcmCmdDp

callback java-class-name [ com.tailf.ns.tailfHcc.TcmCmdDp ]

component tailf-hcc

application java-class-name com.tailf.ns.tailfHcc.TcmApp

application start-phase phase2

oper-status up

[ok][2017-09-06 20:18:18]

admin@ncs> show ncs-state ha

ncs-state ha mode master

ncs-state ha node-id CentOS64-1

ncs-state ha connected-slave [ CentOS64-2 ]

[ok][2017-09-06 18:51:37]

and Slave:

admin@ncs> show ncs-state ha

ncs-state ha mode slave

ncs-state ha node-id CentOS64-2

ncs-state ha master-node-id CentOS64-1

[ok][2017-09-06 18:51:31]

Using this Configuration (must be identical on both nodes):

[ok][2017-09-06 18:52:00]

admin@ncs> show configuration ha

token HA2;

vip {

address 192.168.56.100;

}

member CentOS64-1 {

address 192.168.56.103;

default-ha-role master;

vip-interface eth1;

}

member CentOS64-2 {

address 192.168.56.101;

default-ha-role slave;

failover-master true;

vip-interface eth1;

}

[ok][2017-09-06 18:53:06]

However, the tailf-hcc configuration is likely irrelevant unless the cause of the "Fxs mismatch, slave is not allowed" error can be determined.

From you notes, the following questions:

1) Is NSO install as local-install or system-install?

For a system-install, NSO release will be by default installed at /opt/ncs/ncs-4.4.2 and the run-time directory will be at /var/opt/ncs - and therefore packages to be run at /var/opt/ncs/packages.

The directories in use above - /opt/ncs-instance/packages - would indicate that either you local-install or system-install with a custom run-time directory. Wondering if NSO is retrieving packages from where you think it is?

2) To Jan's note above, are there any other packages besides tailf-hcc in your <run-time-dir>/packages directory?

To simplify, remove all files from the packages directory with the exception of the lone file: ncs-4.4.1.2-tailf-hcc-4.2.0.tar.gz. (does not need to be untar'd)

After re-loading packages, do a 'show packages' to determine if only the tailf-hcc package is present and successfully loaded.

3) A tactic to make sure both packages directory have identical packages, just scp the entire <run-time-dir>/packages directory from the Master node to Slave node.

4) Just a sanity check here, both nodes are using the exact same NSO version?

View solution in original post

Jan Lindblad · ‎08-29-2017

Cannot bind to internal socket normally means the port is in use (or is privileged, but not the case here), so ensure nothing else is using that port. Or switch port numbers to something else that's free.

eric.n.dunn.ctr · ‎08-31-2017

Thanks. I'm testing tailf-hcc package with no success like the manual-ha package.

If there is anyone who is successful with either, I'd appreciate some pointers.

Jan Lindblad · ‎09-01-2017

Does NSO start up fine in NONE-mode (i.e. without HA) on both machines? What do you see in the logs (e.g. logs/ncs.log and logs/devel.log). If that "cannot bind to internal socket" message persists, have you tried changing to a different set of HA ports? Or back to the default?

eric.n.dunn.ctr · ‎09-05-2017

Yes, NSO starts fine in "NONE" mode. on both machines. ncs.log has an error message that states: "Fxs mismatch, slave is not allowed"

devel.log does not show any errors. ncs-java-vm.log states: "failed to call HA"

Does the tailf-hcc package need a special license?

I added keys to each server to make them passwordless.

I've done the the necessary nmap, nc, netstat method of making sure both servers are talking to each other.

I am testing NSO 4.4.2 and tailf-hcc 4.2.0 ... is there another version I should test with?

lmanor · ‎09-05-2017

This error message "Fxs mismatch, slave is not allowed" indicates that the packages in your Master and Slave NSO instances are not _identical_. Given that these packages define the CDB schema, they must be identical for NSO HA (CDB replication).

eric.n.dunn.ctr · ‎09-05-2017

I reinstalled the hcc package making sure it was same on both servers.

I still get the Fxs mismatch.

Jan Lindblad · ‎09-06-2017

The complete list of packages on both machines need to be exactly the same. Is that what you have?

eric.n.dunn.ctr · ‎09-06-2017

CLASSIFICATION: UNCLASSIFIED

Both are same.

I did the following on each server.

1. I copied the 'ncs-4.4-tailf-hcc-project-4.2.0.signed' to each server /tmp/ncs-4.4-tailf-hcc-project-4.2.0.signed

2. sh ncs-4.4-tailf-hcc-project-4.2.0.signed

3. tar -zxvf ncs-4.4-tailf-hcc-project-4.2.0.tar.gz

4. cd ncs-4.4-tailf-hcc-project-4.2.0/packages

5. tar -zxvf ncs-4.4.1.2-tailf-hcc-4.2.0.tar.gz

6. cp -r tailf-hcc /opt/ncs/packages/

7. cd /opt/ncs-instance/packages

8. ln -s /opt/ncs/packages/tailf-hcc tailf-hcc

9. start ncs

10. ncs_cli -u admin

11. request packages reload (wait for success on all installed packages)

I tried one other variation where I ran 'make all' in .../packages/tailf-hcc/src

No one has provided a working copy of any configuration files from a successful HA implementation.

Maybe no one uses HA in their Network workflows?

lmanor · ‎09-06-2017

Eric,

Actually, quite to the contrary, NSO HA and the tailf-hcc package have been widely deployed and operational in many customer networks.

I just quickly did a test that confirms that tailf-hcc version 4.2.0 works fine with NSO 4.4.2.

[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd packages/

[lmanor@CentOS64-1 packages]$ ls

ncs-4.4.2-tailf-hcc-4.2.0.tar.gz

[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd ..

[lmanor@CentOS64-1 ncs-ha-4.4.2]$ tail -n 20 ncs.conf

<.. snip ..>

<ha>

</ha>

<.. snip ..>

</ncs-config>

Master NCS:

admin@ncs> show packages

packages package tailf-hcc

package-version 4.2.0

description "Package for Tail-f HA Cluster Control Interface"

ncs-min-version [ 4.1.3 4.2 ]

directory ./state/packages-in-use/1/tailf-hcc

component TcmCmdDp

callback java-class-name [ com.tailf.ns.tailfHcc.TcmCmdDp ]

component tailf-hcc

application java-class-name com.tailf.ns.tailfHcc.TcmApp

application start-phase phase2

oper-status up

[ok][2017-09-06 20:18:18]

admin@ncs> show ncs-state ha

ncs-state ha mode master

ncs-state ha node-id CentOS64-1

ncs-state ha connected-slave [ CentOS64-2 ]

[ok][2017-09-06 18:51:37]

and Slave:

admin@ncs> show ncs-state ha

ncs-state ha mode slave

ncs-state ha node-id CentOS64-2

ncs-state ha master-node-id CentOS64-1

[ok][2017-09-06 18:51:31]

Using this Configuration (must be identical on both nodes):

[ok][2017-09-06 18:52:00]

admin@ncs> show configuration ha

token HA2;

vip {

address 192.168.56.100;

}

member CentOS64-1 {

address 192.168.56.103;

default-ha-role master;

vip-interface eth1;

}

member CentOS64-2 {

address 192.168.56.101;

default-ha-role slave;

failover-master true;

vip-interface eth1;

}

[ok][2017-09-06 18:53:06]

However, the tailf-hcc configuration is likely irrelevant unless the cause of the "Fxs mismatch, slave is not allowed" error can be determined.

From you notes, the following questions:

1) Is NSO install as local-install or system-install?

For a system-install, NSO release will be by default installed at /opt/ncs/ncs-4.4.2 and the run-time directory will be at /var/opt/ncs - and therefore packages to be run at /var/opt/ncs/packages.

The directories in use above - /opt/ncs-instance/packages - would indicate that either you local-install or system-install with a custom run-time directory. Wondering if NSO is retrieving packages from where you think it is?

2) To Jan's note above, are there any other packages besides tailf-hcc in your <run-time-dir>/packages directory?

To simplify, remove all files from the packages directory with the exception of the lone file: ncs-4.4.1.2-tailf-hcc-4.2.0.tar.gz. (does not need to be untar'd)

After re-loading packages, do a 'show packages' to determine if only the tailf-hcc package is present and successfully loaded.

3) A tactic to make sure both packages directory have identical packages, just scp the entire <run-time-dir>/packages directory from the Master node to Slave node.

4) Just a sanity check here, both nodes are using the exact same NSO version?

eric.n.dunn.ctr · ‎09-07-2017

PERFECT !!!!!!!!

You rescued the desperate!!!

The big issue was that I was expanding the *tar.gz packages.

Once I left them in tact for tailf-hcc, cisco-ios, and juniper-junos I was able to configure correctly.

I was even able to add the discovery package (somehow it works in expanded mode).

It's working great on two servers.

Keeping the packages in tact as *.tar.gz files was something new to me. I may have missed it in the docs?

lmanor · ‎09-07-2017

Eric,

Glad that helped.

To be clear, tar'd and expanded packages should both work equally as well.

You can and often times want to expand the tar'd packages - especially when you are developing new features for a package.

The point to be made here is that there should only be _ONE_ instance of a given package in the packages directory, either in the tar'd form or untar'd (but not both!).

Many times folks untar a package and leave the tar'd version in the packages directory - which may cause problems when NSO first loads the tar'd version and you are attempting to make changes to the untar'd version - and the changes never get loaded...

lmanor · ‎09-07-2017

A couple more points here regarding NSO HA and tailf-hcc HAFW:

1) There is additional licensing required for acquisition and deployment of NSO tailf-hcc package in production networks.

2) There is documentation for the tailf-hcc package with the package itself. Untar the package and find the documentation in tailf-hcc/docs directory.

KJ Rossavik · ‎09-07-2017

To be precise, there is licensing required for a NSO Standby (HA) server in production network, regardless of whether this is implemented by means of the Tail-f HCC FP or not. There is no licensing required for the Tail-f HCC FP itself.

Cheers,

KJ.

hpsg · ‎10-15-2018