Solved: NSO generate error when ha-raft create-cluster

BasharAziz · ‎07-10-2023

NSO generate error when ha-raft create-cluster, I don't know from where ncsd@ is coming! and is added in the front of the hostname

admin@ncs# show ha-raft
ha-raft status role leader
ha-raft status leader nso01.example.com
ha-raft status member [ nso01.example.com nso02.example.com nso03.example.com ]
ha-raft status connected-node [ nso02.example.com nso03.example.com ]
ha-raft status local-node nso01.example.com
SERIAL NUMBER                             EXPIRATION DATE            FILE PATH
--------------------------------------------------------------------------------------------------
xxxx                                              2033-07-07T10:27:52+00:00  /etc/ncs/ssl/cert/nso01.crt

SERIAL NUMBER                             EXPIRATION DATE            FILE PATH
-----------------------------------------------------------------------------------------------
xxxx                                             2033-07-07T08:09:57+00:00  /etc/ncs/ssl/cert/ca.crt

ha-raft status log current-index 0
ha-raft status log applied-index 0
ha-raft status log num-entries 11
NODE                STATE              INDEX  LAG
---------------------------------------------------
ncsd@nso02.example.com  requires-snapshot  0      0
ncsd@nso03.example.com  requires-snapshot  0      0



on the seed node:
admin@ncs# show ha-raft
ha-raft status role stalled
ha-raft status leader nso01.example.com
ha-raft status connected-node [ nso01.example.com nso03.example.com ]
ha-raft status local-node nso02.example.com
SERIAL NUMBER                             EXPIRATION DATE            FILE PATH
--------------------------------------------------------------------------------------------------
xxxx                                               2033-07-07T10:27:39+00:00  /etc/ncs/ssl/cert/nso02.crt

SERIAL NUMBER                             EXPIRATION DATE            FILE PATH
-----------------------------------------------------------------------------------------------
xxxx                                                2033-07-07T08:09:57+00:00  /etc/ncs/ssl/cert/ca.crt

ha-raft status log current-index 0
ha-raft status log applied-index 0
ha-raft status log num-entries 0
admin@ncs#

info message:

<INFO> 10-Jul-2023::10:31:45.441 9ccc7c1bdb1d ncs[152]: Leader[raft_server_ha_raft_1, term 26] append failure for follower {raft_identity,raft_server_ha_raft_1,'ncsd@nso02.example.com'}. Follower reports local log ends at 0. <INFO> 10-Jul-2023::10:31:45.442 9ccc7c1bdb1d ncs[152]: Leader[raft_server_ha_raft_1, term 26] append failure for follower {raft_identity,raft_server_ha_raft_1,'ncsd@nso03.example.com'}. Follower reports local log ends at 0.

NSO01 leader node:

    <ha-raft>
    <enabled>true</enabled>
    <cluster-name>amsterdam</cluster-name>
    <listen>
      <node-address>nso01.example.com</node-address>
    </listen>
    <seed-nodes>
      <seed-node>nso02.example.com</seed-node>
    </seed-nodes>
    <ssl>
      <ca-cert-file>${NCS_CONFIG_DIR}/ssl/cert/ca.crt</ca-cert-file>
      <cert-file>${NCS_CONFIG_DIR}/ssl/cert/nso01.crt</cert-file>
      <key-file>${NCS_CONFIG_DIR}/ssl/cert/nso01.key</key-file>
    </ssl>
  </ha-raft>

NSO02 seed node:

<ha-raft>
    <enabled>true</enabled>
    <cluster-name>amsterdam</cluster-name>
    <listen>
      <node-address>nso02.example.com</node-address>
    </listen>
    <seed-nodes>
      <seed-node>nso02.example.com</seed-node>
    </seed-nodes>
    <ssl>
      <ca-cert-file>${NCS_CONFIG_DIR}/ssl/cert/ca.crt</ca-cert-file>
      <cert-file>${NCS_CONFIG_DIR}/ssl/cert/nso02.crt</cert-file>
      <key-file>${NCS_CONFIG_DIR}/ssl/cert/nso02.key</key-file>
    </ssl>
  </ha-raft>

NOS03:

<ha-raft>
    <enabled>true</enabled>
    <cluster-name>amsterdam</cluster-name>
    <listen>
      <node-address>nso03.example.com</node-address>
    </listen>
    <seed-nodes>
      <seed-node>nso02.example.com</seed-node>
    </seed-nodes>
    <ssl>
      <ca-cert-file>${NCS_CONFIG_DIR}/ssl/cert/ca.crt</ca-cert-file>
      <cert-file>${NCS_CONFIG_DIR}/ssl/cert/nso03.crt</cert-file>
      <key-file>${NCS_CONFIG_DIR}/ssl/cert/nso03.key</key-file>
    </ssl>
  </ha-raft>

BasharAziz · ‎07-31-2023

Hi Erdem,

The problem is been solved with Cisco TAC.

we simply deleted the snapshot: rm -rf /nso/run/state/raft/ha_raft.1/snapshot.0.0

and now it works again.

View solution in original post

eaksu · ‎07-31-2023

Hello BasharAziz,

Thanks for sharing. The 'ncsd@' part is implicitly added as it forms the internal representation for node names together with hostname.
Create cluster issue needs to be investigated further. Could you share the raft.log and devel.log?
From what you shared, it looks like the cluster was formed but later 'nso02.example.com' got stalled for some reason.

Regards,
Erdem

BasharAziz · ‎07-31-2023

Hi Erdem,

The problem is been solved with Cisco TAC.

we simply deleted the snapshot: rm -rf /nso/run/state/raft/ha_raft.1/snapshot.0.0

and now it works again.

eaksu · ‎08-01-2023

I am glad that your issue was resolved.
Cheers!