08-23-2017 11:59 AM - edited 03-01-2019 03:58 AM
I've successfully tested the $NCS_DIR/examples.ncs/web-server-farm/ha (High-Availability) on one server.
master=n1
slave=n2
I cannot find any documentation that shows how to configure the slave to run on another server.
The error is: "cannot bind to internal socket".
I edited the (other server)ncs.conf <ncs-ipc-address> to the IP where the master runs.
I opened firewall for ports 5757 and 5758.
Has anyone had success in configuring HA on two separate servers?
Solved! Go to Solution.
09-06-2017 05:38 PM
Eric,
Actually, quite to the contrary, NSO HA and the tailf-hcc package have been widely deployed and operational in many customer networks.
I just quickly did a test that confirms that tailf-hcc version 4.2.0 works fine with NSO 4.4.2.
[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd packages/
[lmanor@CentOS64-1 packages]$ ls
ncs-4.4.2-tailf-hcc-4.2.0.tar.gz
[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd ..
[lmanor@CentOS64-1 ncs-ha-4.4.2]$ tail -n 20 ncs.conf
<.. snip ..>
<ha>
<enabled>true</enabled>
</ha>
<.. snip ..>
</ncs-config>
Master NCS:
admin@ncs> show packages
packages package tailf-hcc
package-version 4.2.0
description "Package for Tail-f HA Cluster Control Interface"
ncs-min-version [ 4.1.3 4.2 ]
directory ./state/packages-in-use/1/tailf-hcc
component TcmCmdDp
callback java-class-name [ com.tailf.ns.tailfHcc.TcmCmdDp ]
component tailf-hcc
application java-class-name com.tailf.ns.tailfHcc.TcmApp
application start-phase phase2
oper-status up
[ok][2017-09-06 20:18:18]
admin@ncs> show ncs-state ha
ncs-state ha mode master
ncs-state ha node-id CentOS64-1
ncs-state ha connected-slave [ CentOS64-2 ]
[ok][2017-09-06 18:51:37]
and Slave:
admin@ncs> show ncs-state ha
ncs-state ha mode slave
ncs-state ha node-id CentOS64-2
ncs-state ha master-node-id CentOS64-1
[ok][2017-09-06 18:51:31]
Using this Configuration (must be identical on both nodes):
[ok][2017-09-06 18:52:00]
admin@ncs> show configuration ha
token HA2;
vip {
address 192.168.56.100;
}
member CentOS64-1 {
address 192.168.56.103;
default-ha-role master;
vip-interface eth1;
}
member CentOS64-2 {
address 192.168.56.101;
default-ha-role slave;
failover-master true;
vip-interface eth1;
}
[ok][2017-09-06 18:53:06]
However, the tailf-hcc configuration is likely irrelevant unless the cause of the "Fxs mismatch, slave is not allowed" error can be determined.
From you notes, the following questions:
1) Is NSO install as local-install or system-install?
For a system-install, NSO release will be by default installed at /opt/ncs/ncs-4.4.2 and the run-time directory will be at /var/opt/ncs - and therefore packages to be run at /var/opt/ncs/packages.
The directories in use above - /opt/ncs-instance/packages - would indicate that either you local-install or system-install with a custom run-time directory. Wondering if NSO is retrieving packages from where you think it is?
2) To Jan's note above, are there any other packages besides tailf-hcc in your <run-time-dir>/packages directory?
To simplify, remove all files from the packages directory with the exception of the lone file: ncs-4.4.1.2-tailf-hcc-4.2.0.tar.gz. (does not need to be untar'd)
After re-loading packages, do a 'show packages' to determine if only the tailf-hcc package is present and successfully loaded.
3) A tactic to make sure both packages directory have identical packages, just scp the entire <run-time-dir>/packages directory from the Master node to Slave node.
4) Just a sanity check here, both nodes are using the exact same NSO version?
08-29-2017 02:59 AM
Cannot bind to internal socket normally means the port is in use (or is privileged, but not the case here), so ensure nothing else is using that port. Or switch port numbers to something else that's free.
08-31-2017 10:05 AM
Thanks. I'm testing tailf-hcc package with no success like the manual-ha package.
If there is anyone who is successful with either, I'd appreciate some pointers.
09-01-2017 04:48 AM
Does NSO start up fine in NONE-mode (i.e. without HA) on both machines? What do you see in the logs (e.g. logs/ncs.log and logs/devel.log). If that "cannot bind to internal socket" message persists, have you tried changing to a different set of HA ports? Or back to the default?
09-05-2017 09:27 AM
Yes, NSO starts fine in "NONE" mode. on both machines. ncs.log has an error message that states: "Fxs mismatch, slave is not allowed"
devel.log does not show any errors. ncs-java-vm.log states: "failed to call HA"
Does the tailf-hcc package need a special license?
I added keys to each server to make them passwordless.
I've done the the necessary nmap, nc, netstat method of making sure both servers are talking to each other.
I am testing NSO 4.4.2 and tailf-hcc 4.2.0 ... is there another version I should test with?
09-05-2017 10:25 AM
This error message "Fxs mismatch, slave is not allowed" indicates that the packages in your Master and Slave NSO instances are not _identical_. Given that these packages define the CDB schema, they must be identical for NSO HA (CDB replication).
09-05-2017 12:41 PM
I reinstalled the hcc package making sure it was same on both servers.
I still get the Fxs mismatch.
09-06-2017 12:12 AM
The complete list of packages on both machines need to be exactly the same. Is that what you have?
09-06-2017 05:13 AM
CLASSIFICATION: UNCLASSIFIED
Both are same.
I did the following on each server.
1. I copied the 'ncs-4.4-tailf-hcc-project-4.2.0.signed' to each server /tmp/ncs-4.4-tailf-hcc-project-4.2.0.signed
2. sh ncs-4.4-tailf-hcc-project-4.2.0.signed
3. tar -zxvf ncs-4.4-tailf-hcc-project-4.2.0.tar.gz
4. cd ncs-4.4-tailf-hcc-project-4.2.0/packages
5. tar -zxvf ncs-4.4.1.2-tailf-hcc-4.2.0.tar.gz
6. cp -r tailf-hcc /opt/ncs/packages/
7. cd /opt/ncs-instance/packages
8. ln -s /opt/ncs/packages/tailf-hcc tailf-hcc
9. start ncs
10. ncs_cli -u admin
11. request packages reload (wait for success on all installed packages)
I tried one other variation where I ran 'make all' in .../packages/tailf-hcc/src
No one has provided a working copy of any configuration files from a successful HA implementation.
Maybe no one uses HA in their Network workflows?
09-06-2017 05:38 PM
Eric,
Actually, quite to the contrary, NSO HA and the tailf-hcc package have been widely deployed and operational in many customer networks.
I just quickly did a test that confirms that tailf-hcc version 4.2.0 works fine with NSO 4.4.2.
[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd packages/
[lmanor@CentOS64-1 packages]$ ls
ncs-4.4.2-tailf-hcc-4.2.0.tar.gz
[lmanor@CentOS64-1 ncs-ha-4.4.2]$ cd ..
[lmanor@CentOS64-1 ncs-ha-4.4.2]$ tail -n 20 ncs.conf
<.. snip ..>
<ha>
<enabled>true</enabled>
</ha>
<.. snip ..>
</ncs-config>
Master NCS:
admin@ncs> show packages
packages package tailf-hcc
package-version 4.2.0
description "Package for Tail-f HA Cluster Control Interface"
ncs-min-version [ 4.1.3 4.2 ]
directory ./state/packages-in-use/1/tailf-hcc
component TcmCmdDp
callback java-class-name [ com.tailf.ns.tailfHcc.TcmCmdDp ]
component tailf-hcc
application java-class-name com.tailf.ns.tailfHcc.TcmApp
application start-phase phase2
oper-status up
[ok][2017-09-06 20:18:18]
admin@ncs> show ncs-state ha
ncs-state ha mode master
ncs-state ha node-id CentOS64-1
ncs-state ha connected-slave [ CentOS64-2 ]
[ok][2017-09-06 18:51:37]
and Slave:
admin@ncs> show ncs-state ha
ncs-state ha mode slave
ncs-state ha node-id CentOS64-2
ncs-state ha master-node-id CentOS64-1
[ok][2017-09-06 18:51:31]
Using this Configuration (must be identical on both nodes):
[ok][2017-09-06 18:52:00]
admin@ncs> show configuration ha
token HA2;
vip {
address 192.168.56.100;
}
member CentOS64-1 {
address 192.168.56.103;
default-ha-role master;
vip-interface eth1;
}
member CentOS64-2 {
address 192.168.56.101;
default-ha-role slave;
failover-master true;
vip-interface eth1;
}
[ok][2017-09-06 18:53:06]
However, the tailf-hcc configuration is likely irrelevant unless the cause of the "Fxs mismatch, slave is not allowed" error can be determined.
From you notes, the following questions:
1) Is NSO install as local-install or system-install?
For a system-install, NSO release will be by default installed at /opt/ncs/ncs-4.4.2 and the run-time directory will be at /var/opt/ncs - and therefore packages to be run at /var/opt/ncs/packages.
The directories in use above - /opt/ncs-instance/packages - would indicate that either you local-install or system-install with a custom run-time directory. Wondering if NSO is retrieving packages from where you think it is?
2) To Jan's note above, are there any other packages besides tailf-hcc in your <run-time-dir>/packages directory?
To simplify, remove all files from the packages directory with the exception of the lone file: ncs-4.4.1.2-tailf-hcc-4.2.0.tar.gz. (does not need to be untar'd)
After re-loading packages, do a 'show packages' to determine if only the tailf-hcc package is present and successfully loaded.
3) A tactic to make sure both packages directory have identical packages, just scp the entire <run-time-dir>/packages directory from the Master node to Slave node.
4) Just a sanity check here, both nodes are using the exact same NSO version?
09-07-2017 08:43 AM
PERFECT !!!!!!!!
You rescued the desperate!!!
The big issue was that I was expanding the *tar.gz packages.
Once I left them in tact for tailf-hcc, cisco-ios, and juniper-junos I was able to configure correctly.
I was even able to add the discovery package (somehow it works in expanded mode).
It's working great on two servers.
Keeping the packages in tact as *.tar.gz files was something new to me. I may have missed it in the docs?
09-07-2017 09:06 AM
Eric,
Glad that helped.
To be clear, tar'd and expanded packages should both work equally as well.
You can and often times want to expand the tar'd packages - especially when you are developing new features for a package.
The point to be made here is that there should only be _ONE_ instance of a given package in the packages directory, either in the tar'd form or untar'd (but not both!).
Many times folks untar a package and leave the tar'd version in the packages directory - which may cause problems when NSO first loads the tar'd version and you are attempting to make changes to the untar'd version - and the changes never get loaded...
09-07-2017 10:06 AM
A couple more points here regarding NSO HA and tailf-hcc HAFW:
1) There is additional licensing required for acquisition and deployment of NSO tailf-hcc package in production networks.
2) There is documentation for the tailf-hcc package with the package itself. Untar the package and find the documentation in tailf-hcc/docs directory.
09-07-2017 10:11 AM
To be precise, there is licensing required for a NSO Standby (HA) server in production network, regardless of whether this is implemented by means of the Tail-f HCC FP or not. There is no licensing required for the Tail-f HCC FP itself.
Cheers,
KJ.
10-15-2018 01:44 PM - edited 10-15-2018 02:04 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide