Cisco Meeting Server 3.6 Database Cluster what's new about certificate

Meddane · ‎01-31-2023

Cluster Database Configuration between cms1 cms2 and cms3

For the server certificate, you can use a multi-SAN certificate containing all the database server FQDN in the SAN attribute reducing thus the certificate management by using a single certificate for all nodes and all services.

For the client certificate, the Common Name (CN) must be set to postgres. When a server database receives a client certificate, they check that the CN field is equal to postgres to validate the authentication.

For database we need to generate two CSRs with the corresponding private keys, the client and the server.

Use one CMS to generates these two CSRs, once you get the server and client certificates from your CA, copy the two certificates with their private keys to all nodes using WinSCP.

For Server certificate use the following command, give a name for example dbcert, it is important to put the CN to the FQDN of the Master Database cms1 and the FQDN of the slaves in the SAN. For example: cms1.lab.local in the Common Name, cms2.lab.local and cms3.lab.local in the SAN.

cms1>pki csr dbcert CN:cms1.lab.local OU:CCNP O:Collaboration L:lab ST:local C:US subjectAltName:cms2.lab.local,cms3.lab.local

For Client certificate use the following command, give a name for example dbclt.

cms1>pki csr dbclt CN:postgres

On CMS1.

Configure the database client and server certificates created previously and named dbcert and dbclt. The Root-CA certificate is added to verify the validity of the client/ server certificates. Specify which interface to use for the database clustering and initialize the master database.

cms1>database cluster certs dbcert.key dbcert.cer dbclt.key dbclt.cer Root-CA.cer

cms1>database cluster localnode a

cms1>database cluster initialize

On CMS2 and CMS3.

Configure the database client and server certificates created previously and named dbcert and dbclt. The Root-CA certificate is added to verify the validity of the client/ server certificates, specify the interface to use, connect cms2 and cms3 to the master database cms1.

cmsx>database cluster certs dbcert.key dbcert.cer dbclt.key dbclt.cer Root-CA.cer

cmsx>database cluster localnode a

cmsx>database cluster join cms1.lab.local

On both cms1, cms2 and cms3, Verify the status of the database cluster.

The database status on CMS2 and CMS3. The status shown ERROR Cannot find primary node in cluster.

CMS2 and CMS3 fail to connect to the primary node CMS1. So what’s wrong here? let’s check the logs using the syslog follow command on CMS 2 and CMS3.

The output of CMS2’s log indicates the message CMS1: error could not translate host name “cms1 to address.

CMS2 tries to resolve the server’s name CMS1 but cannot find the IP address 10.1.5.61.

The output of CMS3’s log indicates the message CMS1: error could not translate host name “cms1 to address.

CMS3 tries to resolve the server’s name CMS1 but find the IP address 10.1.5.61.

To solve the issues, we need DNS A record to resolve the name CMS1 to the IP address 10.1.5.61. Fortunately, Cisco Meeting server allows to create a DNS RR record to resolve the server’s name locally.

On CMS2 and CMS3, add a DNS RR record using the following commands.

Now let’s verify the database status on CMS2 and CMS3.

The output show that both are in the Connected status to the primary node CMS1 10.1.5.61.

But both nodes CMS2 and CMS3 fails to connect to each other.

Let’s verify the logs on CMS2 and CMS3.

CMS2 cannot find the IP address 10.1.5.63 of the hostname CMS3.

CMS3 cannot find the IP address 10.1.5.62 of the hostname CMS2.

On the primary node CMS1, verify the database status, we can see that the primary node fails to connect to CMS2 and CMS3.

Let’s check the logs on CMS1. The same issue is displayed, CMS1 fails to find the IP addresses of the server’s name CMS2 and CMS3.

To solve the issues, we need to add two DNS RR Records on CMS1 to resolve the server’s name of CMS2 and CMS3. Configure the following commands as shown below

On CMS2, configure one DNS RR Record to resolve the server’s name of CMS3.

On CMS3, configure one DNS RR Record to resolve the server’s name of CMS2.

On CMS1 verify the DNS Record entries, it must have two DNS RR Records to resolve the server’s name of CMS2 and CMS3.

On CMS2 verify the DNS Record entries, it must have two DNS RR Records to resolve the server’s name of CMS1 and CMS3.

On CMS3 verify the DNS Record entries, it must have two DNS RR Records to resolve the server’s name of CMS1 and CMS2.

Finally, the Database status on CMS1, CMS2 and CMS3 shows that the status is Connected for all nodes.

Note: From version 3.5, Cisco Meeting Server can use other validations Hostname/IP Address If you are deploying or upgrading to version 3.6 (or 3.5). The database cluster nodes might fail to connect to each other. The reason is that each server will try to resolve the server's name (or hostname) of each node in the cluster, which is not available in version prior 3.5.

The database cluster verifymode <full/ca> command allows you to configure other validations. If the command is set to full, the Meeting Server along with certificates, verifies if the server identity name (hostname) matches with the name stored in the server certificates. While, if the command is set to ca, the Meeting Server will validate only the Certificate Authority.

By default, the verifymode is set to ca as shown below.

To enable the verifymode full, the nodes need to be removed from the cluster database.

On all CMS, execute the database cluster remove command.

On all CMS, enable the verifymode full using the database cluster verifymode full command.

=

On the primary node CMS1, run the database cluster initialize command.

On CMS and CMS3, run the database cluster join cms1.lab.local command.

On CMS1, verify the database status, the verifymode full is now enabled. But the slaves CMS2 and CMS3 are not yet connected to the primary node.

On CMS2 and CMS3, verify the database status, the verifymode full is now enabled.

The output of the database cluster status command displays ERROR: Cannot find primary in cluster.

To identify the problem, let’s execute the syslog follow command on CMS2 and CMS3.

On CMS2, we can see that the logs tell us that the server certificate for cms1.lab.local does not match hostname “cms1”. In other words, the hostname cms1 of the primary node is missing in the Subject Alternative Name SAN of the server certificate.

cms2>

Sep 26 13:49:07.774 user.info cms2 sfpool: Health check cms1: error (up = 1): server certificate for "cms1.lab.local" (and 4 other names) does not match host name "cms1"|

Sep 26 13:49:10.128 user.info cms2 sfpool: Failover Monitor: Unexpected roll call discrepancy; failover saw 0 connected node(s) (), 1 node(s) up (10.1.5.61)

Sep 26 13:49:12.871 user.info cms2 sfpool: Health check cms1: error (up = 1): server certificate for "cms1.lab.local" (and 4 other names) does not match host name "cms1"|

Sep 26 13:49:13.138 user.info cms2 sfpool: Failover Monitor: Unexpected roll call discrepancy; failover saw 0 connected node(s) (), 1 node(s) up (10.1.5.61)

Sep 26 13:49:16.150 user.info cms2 sfpool: Failover Monitor: Unexpected roll call discrepancy; failover saw 0 connected node(s) (), 1 node(s) up (10.1.5.61)

Sep 26 13:49:17.961 user.info cms2 sfpool: Health check cms1: error (up = 1): server certificate for "cms1.lab.local" (and 4 other names) does not match host name "cms1"|

cms2>

The same error message is displayed on CMS3.

cms3>

Sep 26 13:51:49.978 user.info cms3 sfpool: Health check cms1: error (up = 1): server certificate for "cms1.lab.local" (and 4 other names) does not match host name "cms1"|

Sep 26 13:51:50.887 user.info cms3 sfpool: Failover Monitor: Unexpected roll call discrepancy; failover saw 0 connected node(s) (), 1 node(s) up (10.1.5.61)

Sep 26 13:51:53.913 user.info cms3 sfpool: Failover Monitor: Unexpected roll call discrepancy; failover saw 0 connected node(s) (), 1 node(s) up (10.1.5.61)

Sep 26 13:51:55.097 user.info cms3 sfpool: Health check cms1: error (up = 1): server certificate for "cms1.lab.local" (and 4 other names) does not match host name "cms1"|

cms3>

To solve the issue, we need to generate another server certificate including the FQDN of all nodes cmsx.lab.local and the hostname of all nodes CMS1, CMS2 and CMS3.

cms1>pki csr dbcert36 CN:cms1.lab.local OU:CCNP O:Collaboration L:lab ST:local C:US subjectAltName:cms2.lab.local,cms3.lab.local,cms1,cms2,cms3

cms1>

Copy the CSR and the private key into your PC and generate a new server certificate named for example dbcert36.cer.

Verify the new dbcert36.cer certificate include the hostname CMS1, CMS2 and CMS3 along with the FQDNs. Ensure that the dbcert36.cer and the corresponding key is copied into all nodes.

Before reconfiguring the cluster database with the new certificate, we need to remove all nodes from the cluster. Execute the database cluster remove command on all CMS.

cmsx> database cluster remove

On CMS1, configure the database server and client certificates named dbcert36 and dbclt. Run the database cluster initialize command.

cms1>database cluster certs dbcert36.key dbcert36.cer dbclt.key dbclt.cer Root-CA.cer

cms1>database cluster initialize

On CMS2 and CMS3, configure the database server and client certificates named dbcert36 and dbclt. Connect CMS2 and CMS3 to the master database CMS1 using the database cluster join cms1.lab.local command.

cmsx>database cluster certs dbcert36.key dbcert36.cer dbclt.key dbclt.cer Root-CA.cer

cmsx>database cluster join cms1.lab.local

Connect CMS2 and CMS3 to the master database CMS1.

Verify the database status on CMS1, the nodes CMS2 and CMS3 are still not connected to the primary node CMS1.

On CMS2 and CMS3, the database status displays ERROR: postgresql has failed to start.

On CMS2 and CMS3, let’s execute the syslog follow command.

The output tells us that the IP address 10.1.5.61 of the primary node CMS1 is missing the server certificate. In other words, the IP address is not found in the SAN of the server certificate.

Sep 26 14:45:20.119 local0.err cms2 postgres[72323]: [6-1] 2022-09-26 14:45:20 UTC [local] FATAL: the database system is starting up

Sep 26 14:45:20.119 user.warning cms2 host:server: WARNING : database connection failure (FATAL: the database system is starting up)

Sep 26 14:45:20.337 local0.err cms2 postgres[72326]: [6-1] 2022-09-26 14:45:20 UTC FATAL: could not connect to the primary server: server certificate for "cms1.lab.local" (and 5 other names) does not match host name "10.1.5.61"

Sep 26 14:45:21.120 local0.err cms2 postgres[72337]: [6-1] 2022-09-26 14:45:21 UTC [local] FATAL: the database system is starting up

Sep 26 14:45:21.120 user.warning cms2 host:server: WARNING : database connection failure (FATAL: the database system is starting up)

Sep 26 14:46:11.124 local0.err cms3 postgres[53171]: [6-1] 2022-09-26 14:46:11 UTC [local] FATAL: the database system is starting up

Sep 26 14:46:11.124 user.warning cms3 host:server: WARNING : database connection failure (FATAL: the database system is starting up)

Sep 26 14:46:12.125 local0.err cms3 postgres[53174]: [6-1] 2022-09-26 14:46:12 UTC [local] FATAL: the database system is starting up

Sep 26 14:46:12.125 user.warning cms3 host:server: WARNING : database connection failure (FATAL: the database system is starting up)

Sep 26 14:46:12.240 local0.err cms3 postgres[53175]: [6-1] 2022-09-26 14:46:12 UTC FATAL: could not connect to the primary server: server certificate for "cms1.lab.local" (and 5 other names) does not match host name "10.1.5.61"

To solve the issue we need to generate another server certificate including the FQDN of all nodes cmsx.lab.local and the IP address 10.1.5.61 of the primary node CMS1.

cms1>pki csr dbcert3636 CN:cms1.lab.local OU:CCNP O:Collaboration L:lab ST:local C:US subjectAltName:cms2.lab.local,cms3.lab.local,cms1,cms2,cms3,10.1.5.61

Copy the CSR and the private key into your PC and generate a new server certificate named for example dbcert3636.cer.

Verify the new dbcert36.cer certificate include the IP address 10.1.5.61 along with the FQDNs. Ensure that the dbcert3636.cer and the corresponding key is copied into all nodes.

Before reconfiguring the cluster database with the new certificate, we need to remove all nodes from the cluster. Execute the database cluster remove command on all CMS.

cmsx> database cluster remove

On CMS1, configure the database server and client certificates named dbcert3636 and dbclt. Run the database cluster initialize command.

cms1>database cluster certs dbcert3636.key dbcert13636.cer dbclt.key dbclt.cer Root-CA.cer

cms1>database cluster initialize

On CMS2 and CMS3, configure the database server and client certificates named dbcert3636 and dbclt. Connect CMS2 and CMS3 to the master database CMS1 using the database cluster join cms1.lab.local command.

cmsx>database cluster certs dbcert36.key dbcert36.cer dbclt.key dbclt.cer Root-CA.cer

cmsx>database cluster join cms1.lab.local

On CMS1, CMS2 and CMS3, verify the database status, the replication is now good.