cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
16507
Views
59
Helpful
33
Replies

Cluster Database Replication problems

mmendonca
Level 1
Level 1

Hello,

Have 2 new c200-m2 servers.  Have installed ESXi 5.  Created publisher (8.6.2.20000-2) on one server and then had problems when creating the sub on the other server.  Sub wouldn't verify connectivity to Pub.  Finally got by that but when I checked replication status it was all messed up.  After trying the stuff I could find on the forums I opened a TAC case.  Three days later we were able to get the replication working.  When I checked it from the CLI (utils dbreplication runtimestate) both nodes had a status of 2.  When I checked via the GUI however, replication status was good but everything else wasn't.  TAC Engineer stated that this was 'cosmetic'.  We tried several things to reset the status but none worked.  He said he would research this,  I haven't heard back from yet. 

Has anyone seen this issue?  If so what was done to fix it?

Below is a screen shot of the DB replication status report. 

dbreport.png

33 Replies 33

Hi Mark,

Thanks for the update. Hope that the install goes well for you this time. Two things to be mindful of, ensure you add the server to the pub prior installing the subscriber, also ensure that you have an accessible NTP server during the installation.

Regards
Allan

Sent from Cisco Technical Support iPad App

Roger that thanks Allan!

I'm running into the same issue, initially the sub will not verify connectivity with the pub.  I added the server to the pub, started the services and restarted it.  When I check the host file on the pub (show tech network hosts) the subscriber isn't listed.  So even though I've added the sub through the GUI it's not making to the /etc/hosts file.  

Hi Mark,

It's always necessary to restart both nodes after the installation of the subscriber. Following the restart the hosts files ordinarily should be updated. Have you restarted them? Have you checked whether the UCS hosts are running the latest firmware pack? Verify Bios and firmware settings on both hosts through the CIMC, below is a like to the C200 release notes, it good practice to ensure that you have the latest package installed:

http://www.cisco.com/en/US/docs/unified_computing/ucs/release/notes/OL-26648-01.html#wp347582

How is the Pub and Sub communicating, I understanding you are using shared LOM, are these on the same IP subnet, Local switch? Using both NICs? Have you verified LAN port settings, and see if there is any errors on the associated interfaces?

Mark can you confirm that you added the Subscriber server to Publisher before you commenced the subscriber installation or just prior to Publisher connectivity part of the installation of the Sub? Incidentally did TAC ever look into researching the cosmetic issue they referred to it as and come back to you?

Regards
Allan

Sent from Cisco Technical Support iPad App

Allan,

I've previously upgraded both servers to the latest BIOS.  Both servers are on the same switch and subnet.  Both are using both NIC. I  verified connectivity from the PUB to the SUB no errors on interfaces.  I added the subscriber to the Publisher before commencing the sub install.  Wouldn't it be added to the pub host file at that time? 

Mark

Hi Mark,

I would expect to see it with the Publisher's hosts, however ordinarily I find that the Subscribers hosts/rhosts and cdr are inconsistent until after a cluster reset. What I recommend is that you upgrade the Publisher to the latest SU3 first instead of the base release and then proceed with the install of the sub and initiate the upgrade to SU3 during the upgrade. In the meantime I will take a look to see if there are any known caveats about this base release version.

Regards
Allan.

Sent from Cisco Technical Support iPad App

mmendonca
Level 1
Level 1


I don't see 8.6.2 SU3 on the d/l page?  I have SU2 loaded.

Hi Mark,

My apologies, I got my software platforms mixed up, it is UCCX 8.5 which has the latest SU3, and SU2 8.6.2aSU2 22900-9 for CUCM. Have you attempted to install the sub on SU2, if not then proceed with the upgrade as before at least it will be on the latest General release. Unfortunately I have not been able to find any known caveat with symptoms such as yours, that is known publicly. I have installed CUCM 8.6.2aSU2 across three UCS C210s for a customer without any of the problems that you have come across, what is concerning is the remark the TAC engineer made regarding cosmetic, as it clearly isn't as you have found. Have received any further feedback from TAC regarding this?

If following the upgrade to SU2 and there are still replication issues, then I would definitely flag this to the SE dealing with your existing case. Let us know how the upgrade goes. The concern is that initially when the Sub attempts to communicate to Pub that it fails and you have to retry? We will probably have look in the logs to determine what is happening. Have Cisco pulled off any logs.

Regards
Allan

Sent from Cisco Technical Support iPad App

Hello,

It's been mad crazy around here with Sandy passing through.  We had a power failure in our data center and the relay didn't fail to the generator so we were DOA.  All of our sites in the NY/NJ area also are STILL OOB.  I'm sure many of you all are facing the same issues. 

I looked at the logs on the publisher and found this:

pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.241.18.211  user=sftpuser

I also did some packet capture;  The sub is definitely reaching out to the pub but as the log entry indicates is failing authentication.  So I tried "set password user security" on the pub.  It asks for the old pw then the new.  I entered the same pw for the new that I knew was the old.  It came back and said that the new pw has to be different.  So unless that command isn't checking against what is actually the old pw and only comparing the 2 that I entered, I know I have the correct pw.  Does anyone know if actually checks the current pw or if it only compares the 2 that are entered?

Also would the licensing play a part in this?  I'm going to have to rehost the license, which I haven't done yet.

DOH

I answered my own question; I tried resetting the pw to something different.  It came back saying that the original pw didn't match.  So I tried putting my caps lock on for it and presto chango it worked. Cluster auth passed now on to db replication!

Sometimes I'm my own worst enemy!

Hi Mark,

Sorry to hear that you experiencing or being affected by what Mother Nature has to offer. As you've already established the cluster security password has to be the same across all nodes, thus when you change or the password it has to be something difference, so if you tried to set it and it prompted you that it had to be different at least you know what the current passphrase is, so make sure this is the same one on the Sub.

Regards
Allan

Sent from Cisco Technical Support iPad App

After several TAC tickets, (none of which solved the issue) creating and recreating clusters too many times that I want to admit to, I finally got both cluster authentication and dbrep to work! The only different thing I did was in deploying the ova template; When it asks for deployment configuration I was choosing 1000 user node – c200 (inc BE6K). I changed it to 7500 user node and recreated both the pub and sub and everything worked!

Is the 1000 user node – c200 (inc BE6K) a standalone node by default?

Hi Mark,

I find that quite surprising, the ova sizing restrictions would be limited to the C200-M2, certainly the 7500 size deployment is not supported on this hardware and that only the supported ova templates for this hardware should be used which certainly is the 1000 user per node template. This unless the specification of the server does not match the Test Reference Configuration listed, in other words a CPU with higher core speed? Are Cisco aware of the latest development, I think that they would be surprised that it resolved your DB issues.

My concern is that you are now using an unsupported configuration. Can you post the server information from CIMC to determine the exact spec. Incidentally which version of ova template did you use, 7 or 8?

As an example here is the stipulations around the C200 M2:

Follow these rules for UCS C200M2 TRC#1 with E5506 / 2.13 GHz CPU:
The only supported virtual machine OVA templates are:
CUCM - Unified Communications Manager 1000 users
CER - Emergency Responder 12,000 users
CUC or UCxn - Unity Connection 500 users, 1000 users and 5000 users
CUP - Unified Presence 1000 users
CUCCX - Unified Contact Center Express 100 agents
CUxAC - Unified Department, Business and Enterprise Attendant Consoles
Other unlisted OVA templates are not supported on C200 M2 TRC #1.
Otherwise at this time the following co-residency scenarios are supported:
Any combination allowed by Unified Communications Manager Business Edition 6000.
Any other combination provided you follow the General Rules for Co-residency and Physical/Virtual Hardware Sizing, and only use the above OVA templates supported for UCS C200 M2 TRC#1.

Regards
Allan

Sent from Cisco Technical Support iPad App

Hi Allan,

Good points.  I'll go ahead and try the 1000 node setup today.  I'm thinking it will work also?  Any idea what the c200 be6000 build is and why it wouldn't work?

Did you get the data you posted from Wiki pages?  If you could include a link to it I'd appreciate very much.  I did send a note off to the TAC with this info in it but I haven't heard back from them yet.

Thanks

Mark

I have 2 TAC cases open on this issue; 623774061 on Subscriber cluster authentication failure and 623433823 on database replication failure.  I posted notes from this discussion in both of them.  I have not heard back on either as of yet.  Also the OVA that I used was:  cucm_8.6_vmv7_v1.5.ova.

Does anyone have any info on what the c200 BE6000 deployment is or does?