Creating HA cluster in between Satellite on-prem nodes ver 7

diondohmen · ‎11-13-2019

Hi there,

anyone of you succeeded in configuring a HA cluster in between two on-prem satellites ver 7.x?

We have successfully installed two of these and tried to configure HA as explained in the install_guide.

The first step is running ha_provision_standby on the standby node and runs just fine:

!!! DO NOT ABORT THIS PROCESS AFTER PROCEEDING !!!

Proceed with the above configuration? Enter 'yes' to continue: yes
Adjusting firewall...
success
success
success
Stopping services...
Removed symlink /etc/systemd/system/multi-user.target.wants/satellite.service.
Starting cluster...
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
Setting up for data replication... (active node: x.x.x.x)
71440aea3f5a15d23004ab91ddd1601dcc2f14849099e15f120723bc78717e49

Standby provisioning is complete!
You may now proceed with HA deployment from the active node.

>>

second step is running ha_deploy on the primary node which ends in destroying the cluster for some reason?

!!! DO NOT ABORT THIS PROCESS AFTER PROCEEDING !!!

NOTICE: It is strongly recommended that you perform a backup of your
database before proceeding. Please see the documentation for details.

Proceed with the above configuration? Enter 'yes' to continue: yes
Adjusting firewall...
success
success
success
Stopping services...
Removed symlink /etc/systemd/system/multi-user.target.wants/satellite.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Changing password for user hacluster.
passwd: all authentication tokens updated successfully.
Authenticating cluster user...
x.x.x.x: Authorized
y.y.y.y: Authorized
Setting up cluster...
74cabb7f3440f8e5b71f9d1aa9340d99996e67ae697ae50fa6009186eecded6e
Error: unable to destroy cluster
x.x.x.x: Unable to connect to x.x.x.x, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
y.y.y.y: Unable to connect to y.y.y.y, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
Destroying cluster on nodes: x.x.x.x, y.y.y.y...
x.x.x.x: Stopping Cluster (pacemaker)...
y.y.y.y: Stopping Cluster (pacemaker)...
x.x.x.x: Unable to connect to x.x.x.x, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
y.y.y.y: Unable to connect to y.y.y.y, try setting higher timeout in --request-timeout option (Operation timed out after 60001 milliseconds with 0 out of -1 bytes received)
Configuring cluster...
Error: unable to get cib

The only pointer I have is the failure of multiple iptables statements within the /var/log/messages like:

Nov 13 16:12:31 SSM01A firewalld[7640]: WARNING: COMMAND_FAILED: '/usr/sbin/iptables -w2 -t nat -C POSTROUTING -s 172.16.2.0/24 ! -o atlantis0 -j MASQUERADE' failed: iptables: No chain/target/match by that name.

and there are even more...

Can anyone point me in the right direction of can tell me if he/she succeeded in successfully installing a HA cluster of on-prem satellites?

I already tried running the ha_deploy a second time, after reinstalling the 2nd satellite, with parameter >> ha_deploy --request-timeout=3000 but it seems it doesn't accept this.

another lead may be the following lines within pcsd.log of both machines:

Cannot read config 'corosync.conf' from '/etc/corosync/corosync.conf': No such file

thanks!