cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
108
Views
0
Helpful
6
Replies
Highlighted
Cisco Employee

NSO HA build failed

 

Hi experts,

 

 

I tried to build a two-node HA, nso_m and nso_s. After I finished the config as followed, the HA didn’t work and the ha status on master node was “Error: application communication failure”.

 

But on slave node, ha status was “status nso_s[none]”.

 

 

Configuration on master node:

 

 

[root@nso_m ~]# hostname

 

nso_m

 

 

[root@nso_m ~]# cat /etc/hosts

 

  1. 127.0.0.1   nso_m

 

#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

 

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

 

 

  1. 10.75.44.24 nso_m
  2. 10.75.44.25 nso_s
  3. 10.75.44.30 nso_vip

     

 

admin@ncs# show running-config ha

 

ha token cisco

 

ha interval   4

 

ha failure-limit 10

 

ha vip address 10.75.44.30

 

ha member nso_m

 

address         10.75.44.24

 

default-ha-role master

 

vip-interface   eth0

 

!

 

ha member nso_s

 

address         10.75.44.25

 

default-ha-role slave

 

failover-master true

 

vip-interface   eth0

 

!

 

 

admin@ncs(config)# ha commands status

 

Error: application communication failure

 

 

ncs-java-vm.log:

 

 

<ERROR> 13-Dec-2016::15:53:22.618 TcmApp (tailf-hcc:tailf-hcc)-Run-0: - NCS HA is likely not enabled

 

<ERROR> 13-Dec-2016::15:53:22.619 TcmApp (tailf-hcc:tailf-hcc)-Run-0: - Could not start new session

 

  1. java.lang.RuntimeException: Fail to initalize HA

 

    at com.tailf.ns.tailfHcc.TcmApp.readInitTcmCfg(TcmApp.java:251)

 

    at com.tailf.ns.tailfHcc.TcmApp.run(TcmApp.java:535)

 

    at com.tailf.ncs.ctrl.ApplicationLifeCycle$1.run(ApplicationLifeCycle.java:71)

 

    at java.lang.Thread.run(Thread.java:745)

 

Caused by: com.tailf.ha.HaException: NCS HA is not enabled

 

    at com.tailf.ns.tailfHcc.Cluster.haInitialize(Cluster.java:1039)

 

    at com.tailf.ns.tailfHcc.TcmApp.readInitTcmCfg(TcmApp.java:238)

 

    ... 3 more

 

Caused by: java.net.ConnectException: Connection refused

 

    at java.net.PlainSocketImpl.socketConnect(Native Method)

 

    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

 

    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

 

    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

 

    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

 

    at java.net.Socket.connect(Socket.java:589)

 

    at java.net.Socket.connect(Socket.java:538)

 

    at java.net.Socket.<init>(Socket.java:434)

 

    at java.net.Socket.<init>(Socket.java:211)

 

    at com.tailf.ns.tailfHcc.Cluster.haInitialize(Cluster.java:1035)

 

    ... 4 more

 

<ERROR> 13-Dec-2016::15:53:22.635 NcsMain (tailf-hcc:tailf-hcc)-Run-0: - Received exception from package 'tailf-hcc' Message: 'Failed reading configuration'

 

 

Thank you for your advice.

 

 

Best regards,

 

--

  Zhang Xi

Everyone's tags (3)
6 REPLIES 6
Highlighted
Cisco Employee

Re: NSO HA build failed

 

Hi Zhang,

 

 

Have you enabled HA in ncs.conf?

 

Best Regards,

 

 

.:|:.:|:. Michel Papiashvili

 

Highlighted
Cisco Employee

Re: NSO HA build failed

 

Yes, I enabled HA in ncs.conf.

 

 

  <ha>

 

    <enabled>true</enabled>

 

    <ip>0.0.0.0</ip>

 

    <port>4570</port>

 

    <tick-timeout>PT20S</tick-timeout>

 

  </ha>

 

 

My environment:

 

OS : Centos 7

 

NSO : 4.2.1

 

tailf-hcc package : ncs-4.1.4-tailf-hcc-4.0.7

 

 

Best regards,

 

--

  Zhang Xi

Highlighted
Cisco Employee

Re: NSO HA build failed

And you reloaded NCS? J

Highlighted
Cisco Employee

Re: NSO HA build failed

 

Yang,

 

 

My immediate guess would be that on the slave you're missing an entry for your host in /etc/hosts. This causes the java function InetAddress.getLocalHost().getHostName() to throw java.net.UnknownHostException. As odd as it may seem, the below log messages will be the result.

 

 

Best Regards,

 

/jan

 

Highlighted
Cisco Employee

Re: NSO HA build failed

 

Thank you for the information, Jan!

 

Highlighted
Cisco Employee

Re: NSO HA build failed

 

Hi Jan and experts,

 

 

Thank you for your advices, I had set up HA successfully. The root cause I suppose was firewalld.service which denied port access for the HA communication. And I added the following config under <ha> in ncs.conf :

 

 

  <ha>

 

   <enabled>true</enabled>

 

    <ip>0.0.0.0</ip>

 

<port>4570</port>

 

<tick-timeout>PT20S</tick-timeout>

 

  </ha>

 

 

And what ports should I exactly add to firewall policy to allow HA to set up?

 

 

Thank you!

 

 

 

Best regards,

 

--

  Zhang Xi

Content for Community-Ad
Cisco Community April 2020 Spotlight Award Winners