cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
34355
Views
105
Helpful
19
Replies

CUCM DB Replication Issues

akinsconsult
Level 1
Level 1

I have a CUCM 8.0.3.22900-5 publisher that has been live for a few months.  I recently added a subscriber but am unable to activate callmanager services on it because replication is failing.

When i run the DB replication report on the publisher i have the following errors.  On the subscriber i get the opposite.

Unified CM Database Access
For every server, shows if you can read from the local and publisher databases.
ErrorSource has failed due to source on 10.244.44.11 timing out
ErrorThe publisher database could not be reached from 10.244.44.11 .
ErrorThe local database could not be reached from 10.244.44.11 .
Expand MeView Details

ServerPublisher DB ReachableLocal DB Reachable
10.244.44.10truetrue
10.244.44.11Source has failed due to source on 10.244.44.11 timing out

Source has failed due to source on 10.244.44.11 timing out

I have connectivity between both and that same report also shows this:

RTMT Counter Information
GoodAll servers have a replication count of 519.
GoodAll servers have a good replication status.

Lastly, i have tried using utils dbreplication

Ran utils dbreplication stop on both Sub and then Pub

Ran utils dbreplication dropadmindb on both Sub and then Pub

Ran utils dbreplicatin clusterreset

Ran utils dbreplication reset all on Pub

Rebooted Subscriber (Not Publisher).

The only other item is i don't have DNS set on the publiser or subscriber.

Help please!

1 Accepted Solution

Accepted Solutions

I will suggest you to open a TAC case.

View solution in original post

19 Replies 19

mudmathu
Cisco Employee
Cisco Employee

Hi ..

First of all, I would suggest not to run the replication commands on the cluster and better involve Cisco TAC for DBReplication issues.

But here are few things you can check ::

1) goto CUCM Cli and run "utils service list" and make sure you are running all key services.

example :: A Cisco DB / A Cisco DB Replicator / Cluster Manager / Tomcat / TFTP..etc..

2) Check the service on all the nodes in the cluster.

3) If all services are good, generate the "Unified Database Report" from Unified Reporting.

4) Check the syscdr status for all the node in the report. This must show the node entries for all the nodes in the cluster.

5) Make sure that you pub syscdr file, should have entry of pub and sub, and your sub have the enrty of sub and pub.

6) If you are missing those files or you see those files are empty then let me know and we can device an action plan further.

thanks so much for the response.

for 1. utils service list shows all services started

on those reports from the subscriber, i keep getting: Source has failed due to source on 10.244.44.10 (publisher) timing out

if i run it from the publisher, i get: Source has failed due to source on 10.244.44.1 (subscriber) timing out


I can ping between both servers.  Seems like a networking issue?

there are two commands to check the connectivity issues ::

1) goto CLI of CUCM ::

2) do a test using "utils network connectivity" on both the nodes.

3) also do "utils diagnose test" to check for any dns issues or ntp issues.

seems to me that the tomcat service on pub is not working properly.


- Restart the tomcat service and AMC service on the pub and then check in the report, if you still get the same error.

PS .. Please rate useful posts ..!!

utils network connectivity passed on pub and sub

utils diagnose test passed on pub and sub

restarted tomcat and amc on pub and sub

still getting "Source has failed due to source on 10.244.44.11 timing out"

Goto CUCM Cli of the Pub::

type "utils dbreplication runtimestate".

Paste the output over here.

Also please attach the .xml report from the Unified Database Report so that I can check the errors.

Thanks

Mudit

admin:utils dbreplication runtimestate

DB and Replication Services: ALL RUNNING

Cluster Replication State: Replication repair command started at: 2011-05-20-01-13
     Replication repair command COMPLETED 519 tables processed out of 519
     No Errors or Mismatches found.

     Use 'file view activelog cm/trace/dbl/sdi/ReplicationRepair.2011_05_20_01_13_22.out' to see the details

DB Version: ccm8_0_3_22900_5
Number of replicated tables: 519

Cluster Detailed View from PUB (2 Servers):

                                PING            REPLICATION     REPL.   DBver&  REPL.   REPLICATION SETUP
SERVER-NAME     IP ADDRESS      (msec)  RPC?    STATUS          QUEUE   TABLES  LOOP?   (RTMT) & details
-----------     ------------    ------  ----    -----------     -----   ------- -----   -----------------
Akins-CUCM-Publish      10.244.44.10    0.042   Yes     Connected       0       match   N/A     (2) PUB Setup Completed
AkinsCUCM02     10.244.44.11    0.235   Yes     Connected       402     match   N/A     (2) Not Setup

I have checked the report.

The Replication is good and is happening, but it looks like the some of the ports in your cluster has been bolcked. Can you check if you have any firewall doing it.

Also do "utils firewall list" and check if you can see any ports blocked.

The Replication is GOOD in your cluster so fdon't woory about it.

Thanks

Mudit

utils firewall list doesn't show anything blocking.

when i try to activate the callmanager service on the subscriber, i get: Cisco CallManager Service cannot be Activated or Deactivated due to Database  Update Failure.

If replication is working, why can't i active the callmanager service?  i want the IP phones to register to the subscriber.

I will recommend you to open a ticket with Cisco TAC and have some Engineer look into your network.

Also .. it looks more like a connectivity issue but we need to dig in further to find the cause of failure.

In the meanwhile, you can goto Subscriber's cli and start the Call Manager service using the cli.

goto -> cli -> utils service start Cisco CallManager

and then you can point your phones to the Subscriber.

it does seem to be a networking issue.  The publisher is a physical MCS server.  the subscriber is on a UCS vmware server.

i installed another subscriber on the UCS vmware server and the two vms can communicate properly in the

database status report but the physical MCS doesn't.  If i run the report from the physical box, it communicates but the other 2 VMs don

't.  all 3 can ping each other

I will suggest you to open a TAC case.

this probem is now resolved thanks to TAC

for anyone interested, the problem ended up being because we reset the security password.  We did this PRIOR to adding the subscribers but it still caused dbreplication issues.  Cisco TAC had to get root access and run a script to correct the password inconsistencies.