cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9248
Views
10
Helpful
6
Replies

CUCM 8.6 db sync problem

ronin2k8cronus
Level 1
Level 1

Hi guys,

I have a problem with my CUCM cluster. It is a 8.6 cluster with a pub and one sub. It is a cluster over the wan (vpn between locations). Untill a couple of days ago everything was fine. In the meantime a router that was tacking care of the connections between the two locations stopped working and was replaced about an hour later with another one.

The problem is that now the cluster has some big problems. The two servers are able to ping each other but it seems that the Pub can not connect to the sub DB anymore.

In the attached doc you can find more info, out of which I extracted a few data below:

PUBLISHER:

admin:

admin:show tech network hosts

-------------------- show platform network --------------------

/etc/hosts File:

#This file was generated by the /etc/hosts cluster manager.

#It is automatically updated as nodes are added, changed, removed from the cluster.

127.0.0.1 localhost

::1 localhost

10.1.55.5 CMP

10.1.157.6 CMS

admin:utils diagnose module validate_network

Log file: platform/log/diag2.log

Starting diagnostic test(s)

===========================

test - validate_network   : Passed                    

Diagnostics Completed

admin:show network cluster

10.1.55.5 cmp Publisher not authenticated - INITIATOR since Wed Jul 25 15:47:18 2012

10.1.157.6 cms Subscriber authenticated

admin:utils dbreplication runtimestate

DB and Replication Services: ALL RUNNING

Cluster Replication State: Only available on the PUB

DB Version: ccm8_6_1_20000_1

Number of replicated tables: 0

Cluster Detailed View from SUB (2 Servers):

                               PING           REPLICATION     REPL.   DBver& REPL.   REPLICATION SETUP

SERVER-NAME     IP ADDRESS     (msec) RPC?   STATUS         QUEUE   TABLES LOOP?   (RTMT)

-----------     ------------   ------ ----   -----------     -----   ------- -----   -----------------

CMP                      10.1.55.5       1.35              No     Off-Line       N/A     ?       No     (?)

CMS                       10.1.157.6     0.048           Yes     Off-Line       N/A     match   Yes     (0)

System Reports :

Unified CM Database Access

Local and publisher databases   accessible.

View Details

Server

Publisher   DB Reachable

Local   DB Reachable

10.1.55.5

true

true

10.1.157.6

true

true


Unified CM Database Status


RTMT Counter Information

Connection to RTMT on 10.1.157.6   could not be established

All servers have a replication   count of 541.

Not all servers have a good   replication status. See the details.

View Details

Server

Number   of Replicates Created

Replicate_State

10.1.55.5

541

3 - bad

10.1.157.6

N/A

N/A

See also Database Summary Screen   in RTMT.






Run CLI command (show tech   dbstateinfo) for more detail.


Replication Server List (cdr list   serv) from every server for debugging purposes only.

View Details

Server

cdr   list serv

10.1.55.5

SERVER                   ID STATE   STATUS     QUEUE   CONNECTION CHANGED

-----------------------------------------------------------------------

g_cmp_ccm8_6_1_20000_1     2 Active   Local           0              

10.1.157.6



Replication Server Template (cdr   list template) from every server for debugging purposes only.





View Details


Database Prefs File

View Details
Unified CM Hosts

All servers have equivalent host   files

View Details

Server

Host   Information

10.1.55.5

#This file was generated by the /etc/hosts cluster   manager.

#It is automatically updated as nodes are added, changed,   removed from the cluster.

127.0.0.1 localhost

::1 localhost

10.1.55.5 CMP

10.1.157.6 CMS

10.1.157.6

#This file was generated by the /etc/hosts cluster   manager.

#It is automatically updated as nodes are added, changed,   removed from the cluster.

127.0.0.1 localhost

::1 localhost

10.1.55.5 CMP

10.1.157.6 CMS


Unified CM Rhosts

All servers have equivalent rhosts   files.

View Details

Server

rhosts   File

10.1.55.5

localhost

CMS

CMP

10.1.157.6

localhost

CMP

CMS


Unified CM Sqlhosts

All servers have equivalent   sqlhosts files.

View Details

Server

sqlhosts   File

10.1.55.5

g_hdr   group   -       -       i=1

g_cmp_ccm8_6_1_20000_1 group   -       -       i=2

cmp_ccm8_6_1_20000_1   onsoctcp       10.1.55.5      cmp_ccm8_6_1_20000_1        g=g_cmp_ccm8_6_1_20000_1   b=32767,rto=300

g_cms_ccm8_6_1_20000_1 group   -       -       i=3

cms_ccm8_6_1_20000_1   onsoctcp       10.1.157.6     cms_ccm8_6_1_20000_1        g=g_cms_ccm8_6_1_20000_1   b=32767,rto=300

###NOTE: Need to use ipv4 address in host column of   sqlhosts file and not hostname

cmp_car8_6_1_20000_1   onsoctcp       10.1.55.5      cmp_car8_6_1_20000_1   b=32767

10.1.157.6

g_hdr   group   -       -       i=1

g_cmp_ccm8_6_1_20000_1 group   -       -       i=2

cmp_ccm8_6_1_20000_1   onsoctcp       10.1.55.5      cmp_ccm8_6_1_20000_1        g=g_cmp_ccm8_6_1_20000_1   b=32767,rto=300

g_cms_ccm8_6_1_20000_1 group   -       -       i=3

cms_ccm8_6_1_20000_1   onsoctcp       10.1.157.6     cms_ccm8_6_1_20000_1        g=g_cms_ccm8_6_1_20000_1   b=32767,rto=300

Please help. I read a lot about the subject, but could not reach a solution.

Thank you.

6 Replies 6

Chris Deren
Hall of Fame
Hall of Fame

I don't think splitting cluster across VPN is supported, so I would definitely try to mitigate that.

Here is a great doc on troubleshooting and resolving DB replication issues:

https://supportforums.cisco.com/docs/DOC-13672

HTH,

Chris

Hi,

I read the document and tried the solutions suggested in there but they did not fix my problem.

In my case the Pub can't connect to the Sub although it seems to know it exist and can ping it.

Pub cli:

admin:utils dbreplication runtimestate

DB and Replication Services: ALL RUNNING

Cluster Replication State: REPLICATION RESET all Started at 2012-07-25-16-41

DB Version: ccm8_6_1_20000_1

Number of replicated tables: 541

Cluster Detailed View from PUB (2 Servers):

                                                       PING            REPLICATION     REPL.   DBver&  REPL.   REPLICATION SETUP

SERVER-NAME     IP ADDRESS     (msec) RPC?    STATUS          QUEUE   TABLES  LOOP?   (RTMT) & details

-----------     ------------    ------  ----         -----------         -----   ------- -----   -----------------

CMP                    10.1.55.5              0.044   Yes     Connected       0       match   Yes     (3) PUB

CMS                    10.1.157.6            1.28      No      Off-Line        N/A     ?       No      (?) N/A 

admin:show network cluster

10.1.157.6 cms Subscriber not authenticated - INITIATOR since Wed Jul 25 15:51:16 2012

10.1.55.5 cmp Publisher authenticated

admin:run sql select name,node id from process node

The specified table (process) is not in the database.

Similar output on the Sub.

It worked over the WAN connection (actually is a tunnel with crypto maps). Could this be the problem? Pub can not connect to the Sub because of some network problem although utils diagnose module validate_network says it is ok?

I am thinking about deleting the sbscriber and then adding it again. Is there a procedure for that.

Thank you for your help.

Silviu.

hi ronin,

can you resolve this issues???.....i am having the same problem with the same scenario, my connections of wan goes through ipsec vpn (asa8.2.5---asa8.2.5) and my cucm cluster show the same error on replication runtime command and all them.

please i need urgent help.

As I recall I did nothing to the CUCMs. The problem was the network.

The problem is that the packets sent by the CUCM are too large (close to 1500) and the network needs to fragment them but you problebly do not have the df bit set to 0.

I see three solutions to your problem (try to see if one matches your requirements):

1) stop using ipsec (and since you will lose the added overhead the packets will not need to be fragmented anymore so the CUCM will see each other ok)

2) fragment the packets & adjust the mss to a lower value

route-map DF-BIT permit 10

match ip address 1455

set ip df 0

interface GigabitEthernet0/1.157

description == The Subinterface on which you can find the Sub which is in a different location than the Pub ==

encapsulation dot1Q 157

ip address x y

ip tcp adjust-mss 1300

ip policy route-map DF-BIT

3) When you install a CUCM, the installer will ask you at a certain point to set the MTU, so set it to a lower value (so you must reinstall the sub that is causing problems). I think this is the recommended path.

This sounds like a trial and error mechanism, so my recommendation is try to involve TAC because I am sure they hit this problem many times and they will probably  give you a fast and correct response to your problem.

Hope this helps,

Silviu.

Harmit Singh
Cisco Employee
Cisco Employee

Hello all,

Please take a look at the blog and video I posted on dbreplication runtimestate:

Blog url: https://supportforums.cisco.com/community/netpro/collaboration-voice-video/ip-telephony/blog/2012/10/26/understanding-cucm-dbreplication-runtimestate

Please feel free to provide your feedback and any additional questions you may have on this topic.

Thanks,

Harmit.

Hey guys,

I have faced the same problem in my Cluster. But the only difference is that replication is good in the reports.

But the message on sqlhosts File appear as mentioned:

###NOTE: Need to use ipv4 address in host column of sqlhosts file and not hostname

admin:show tech network hosts

-------------------- show platform network --------------------

/etc/hosts File:

#This file was generated by the /etc/hosts cluster manager.

#It is automatically updated as nodes are added, changed, removed from the cluster.

127.0.0.1 localhost

::1 localhost

10.0.55.194  BRDC1CUP0003

10.0.55.196 BRDC1CUP0001.name BRDC1CUP0001

10.0.55.197  BRDC1CUP0002

10.0.55.195  BRDC1CUP0004

admin:show network cluster

10.0.55.194 brdc1cup0003  Publisher authenticated

10.0.55.196 brdc1cup0001.name brdc1cup0001 Subscriber authenticated using UDP since Tue May 28 09:43:14 2013

10.0.55.197 brdc1cup0002  Subscriber authenticated using UDP since Tue May 28 09:43:14 2013

10.0.55.195 brdc1cup0004  Subscriber authenticated using TCP since Tue May 28 09:49:28 2013

admin:utils dbreplication runtimestate

DB and Replication Services: ALL RUNNING

Cluster Replication State: Replication status command started at: 2013-05-28-10-11

     Replication status command COMPLETED 541 tables checked out of 541

     No Errors or Mismatches found.

     Use 'file view activelog cm/trace/dbl/sdi/ReplicationStatus.2013_05_28_10_11_29.out' to see the details

DB Version: ccm8_6_2_20000_2

Number of replicated tables: 541

Cluster Detailed View from PUB (2 Servers):

                                PING            REPLICATION     REPL.   DBver&  REPL.   REPLICATION SETUP

SERVER-NAME     IP ADDRESS      (msec)  RPC?    STATUS          QUEUE   TABLES  LOOP?   (RTMT) & details

-----------     ------------    ------  ----    -----------     -----   ------- -----   -----------------

BRDC1CUP0003    10.0.55.194     0.075   Yes     Connected       0       match   Yes     (2) PUB Setup Completed

BRDC1CUP0004    10.0.55.195     0.269   Yes     Connected       0       match   Yes     (2) Setup Completed

Servercdr list serv
10.0.55.194
SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
-----------------------------------------------------------------------
g_brdc1cup0003_ccm8_6_2_20000_2    2 Active   Local           0                
g_brdc1cup0004_ccm8_6_2_20000_2    3 Active   Connected       0 May 28 09:49:30
10.0.55.195
SERVER                 ID STATE    STATUS     QUEUE  CONNECTION CHANGED
-----------------------------------------------------------------------
g_brdc1cup0003_ccm8_6_2_20000_2    2 Active   Connected       0 May 28 09:49:30
g_brdc1cup0004_ccm8_6_2_20000_2    3 Active   Local           0                

Serversqlhosts File
10.0.55.194
g_hdr     group     -     -     i=1
g_brdc1cup0003_ccm8_6_2_20000_2     group     -     -     i=2
brdc1cup0003_ccm8_6_2_20000_2     onsoctcp     10.0.55.194     brdc1cup0003_ccm8_6_2_20000_2     g=g_brdc1cup0003_ccm8_6_2_20000_2 b=32767,rto=300
g_brdc1cup0004_ccm8_6_2_20000_2     group     -     -     i=3
brdc1cup0004_ccm8_6_2_20000_2     onsoctcp     10.0.55.195     brdc1cup0004_ccm8_6_2_20000_2     g=g_brdc1cup0004_ccm8_6_2_20000_2 b=32767,rto=300
###NOTE: Need to use ipv4 address in host column of sqlhosts file and not hostname
brdc1cup0003_car8_6_2_20000_2     onsoctcp     10.0.55.194     brdc1cup0003_car8_6_2_20000_2     b=32767
10.0.55.195
g_hdr     group     -     -     i=1
g_brdc1cup0003_ccm8_6_2_20000_2     group     -     -     i=2
brdc1cup0003_ccm8_6_2_20000_2     onsoctcp     10.0.55.194     brdc1cup0003_ccm8_6_2_20000_2     g=g_brdc1cup0003_ccm8_6_2_20000_2 b=32767,rto=300
g_brdc1cup0004_ccm8_6_2_20000_2     group     -     -     i=3
brdc1cup0004_ccm8_6_2_20000_2     onsoctcp     10.0.55.195     brdc1cup0004_ccm8_6_2_20000_2     g=g_brdc1cup0004_ccm8_6_2_20000_2 b=32767,rto=300
Daniel Sobrinho