cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
247
Views
0
Helpful
1
Replies
Highlighted
Beginner

Problem setting up HA cluster within Cisco Smart on-prem satellite

Dear community,

 

we are having an issue activating the ha_cluster in between two on-prem satellites version 7.x

running provision_standby works just fine on the standy node.

 

but when running ha_deploy from the active node, we see these errors, resulting in the failure of the cluster setup

NOTICE: It is strongly recommended that you perform a backup of your
database before proceeding. Please see the documentation for details.

Proceed with the above configuration? Enter 'yes' to continue: yes
Adjusting firewall...
success
success
success
Stopping services...
Removed symlink /etc/systemd/system/multi-user.target.wants/satellite.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
Authenticating cluster user...
x.x.x.x: Authorized
y.y.y.y: Authorized
Setting up cluster...
74cabb7f3440f8e5b71f9d1aa9340d99996e67ae697ae50fa6009186eecded6e
Error: unable to destroy cluster
x.x.x.x: Unable to connect to x.x.x.x, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
y.y.y.y: Unable to connect to y.y.y.y, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
Destroying cluster on nodes: x.x.x.x, y.y.y.y...
x.x.x.x: Stopping Cluster (pacemaker)...
y.y.y.y: Stopping Cluster (pacemaker)...
x.x.x.x: Unable to connect to x.x.x.x, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
y.y.y.y: Unable to connect to y.y.y.y, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
Configuring cluster...
Error: unable to get cib

 

after spitting through the logs, and viewing the steps within deploy_ha.sh I could see multiple lines within the var/log/pcsd/pcsd.log mentioning:

Cannot read config 'corosync.conf' from '/etc/corosync/corosync.conf': No such file

I only have:

-rw-r--r--. 1 root root 2881 Oct 30 2018 corosync.conf.example
-rw-r--r--. 1 root root 767 Oct 30 2018 corosync.conf.example.udpu
-rw-r--r--. 1 root root 3278 Oct 30 2018 corosync.xml.example

 

I don't think i can solve this within the current release, the only thing i could do is extending the timeout as mentioned in the logfile (--request-timeout option), but that won't create a /etc/corosync/corosync.conf file either :)

 

did anyone succeed installing a HA cluster on the 7.x version?

Everyone's tags (1)
1 REPLY 1
Highlighted
Beginner

Re: Problem setting up HA cluster within Cisco Smart on-prem satellite

For any interested reader, I have managed to get past this; the pcsd process was running only port 2224 of the IPv6 stack. So after editing the pcsd service file, I've changed it to IPv4 only. Netstat -ln now showed my the pcsd was listening on ipv4:2224

 

So I fired the ha_deploy again, and the process successfully went past this step of configuring the cluster. Now it got stuk on sending the configs to the individual nodes:

 

Finished running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb send_local_configs
Return value: 0
--Debug Stdout Start--
{
"status": "ok",
"data": {
"x": {
"status": "error"
},
"y": {
"status": "error"
}
},
"log": [
"I, [2019-11-14T16:32:19.481255 #27247] INFO -- : PCSD Debugging enabled\n",
"D, [2019-11-14T16:32:19.481527 #27247] DEBUG -- : Did not detect RHEL 6\n",
"D, [2019-11-14T16:32:19.481559 #27247] DEBUG -- : Detected systemd is in use\n",
"I, [2019-11-14T16:32:19.558296 #27247] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name\n",
"I, [2019-11-14T16:32:19.558358 #27247] INFO -- : CIB USER: hacluster, groups: \n",
"D, [2019-11-14T16:32:19.563694 #27247] DEBUG -- : []\n",
"D, [2019-11-14T16:32:19.563754 #27247] DEBUG -- : [\"Failed to initialize the cmap API. Error CS_ERR_LIBRARY\\n\"]\n",
"D, [2019-11-14T16:32:19.563824 #27247] DEBUG -- : Duration: 0.005317439s\n",
"I, [2019-11-14T16:32:19.563892 #27247] INFO -- : Return Value: 1\n",
"W, [2019-11-14T16:32:19.563935 #27247] WARN -- : Cannot read config 'corosync.conf' from '/etc/corosync/corosync.conf': No such file\n",
"W, [2019-11-14T16:32:19.563982 #27247] WARN -- : Cannot read config 'corosync.conf' from '/etc/corosync/corosync.conf': No such file or directory - /etc/corosync/corosync.conf\n",
"I, [2019-11-14T16:32:19.564656 #27247] INFO -- : Sending config 'pcs_settings.conf' version 0 36dfa9387571c4c5bc22d008a40c4fe3089e2dd0 to nodes: x, y\n",
"I, [2019-11-14T16:32:19.564720 #27247] INFO -- : Sending config 'tokens' version 1 fc751450f01eb34b40267df78fd0e42f4d62633b to nodes: x, y\n",
"I, [2019-11-14T16:32:19.565176 #27247] INFO -- : SRWT Node: x Request: set_configs\n",
"I, [2019-11-14T16:32:19.567929 #27247] INFO -- : SRWT Node: y Request: set_configs\n",
"I, [2019-11-14T16:32:49.568694 #27247] INFO -- : No response from: 212.84.0.x request: set_configs, error: operation_timedout\n",
"I, [2019-11-14T16:32:49.570542 #27247] INFO -- : No response from: 212.84.0.y request: set_configs, error: operation_timedout\n",
"I, [2019-11-14T16:32:49.570837 #27247] INFO -- : Sending config response from x: {\"status\"=>\"error\"}\n",
"I, [2019-11-14T16:32:49.570947 #27247] INFO -- : Sending config response from y: {\"status\"=>\"error\"}\n"
]
}

CreatePlease to create content
Content for Community-Ad

Cisco COVID-19 Survey