Hi JB, Here is the output

samhopealpha · ‎06-08-2015

Hi Everybody,

Here is the environment
CUCM 10.5 - PUB 10.10.242.51
CUCM 10.5 - SUB 10.10.242.52

I am going to re-host the CUCM pub, the following is my procedure

1. DRS backup CUCM PUB (on OLD UCS)
2. DRS restore CUCM PUB (on NEW UCS) (on a separated network)
3. Remove the CUCM-PUB(OLD) from the network
4. Plug the CUCM-PUB(NEW) to the network

when type the "utils dbreplication runtimestate
it shows Pub is DB Active-Dropped

admin:utils dbreplication runtimestate

Server Time: Mon Jun 8 17:25:59 HKT 2015

Cluster Replication State: Only available on the PUB

DB Version: ccm10_5_1_10000_7
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from CUCM2015b (2 Servers):

PING DB/RPC/ REPL. Replication REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) DbMon? QUEUE Group ID (RTMT) & DB Status
----------- ---------- ------ ------- ----- ----------- ------------------
CUCM2015b 10.10.242.52 0.061 Y/Y/Y 0 (g_3) (2) Setup Completed
cucm2015 10.10.242.51 0.100 Y/Y/Y 14926 (g_2) (-) DB Active-Dropped

then I tried
1. utils dbreplication stop (CUCM-SUB)
2. utils dbreplication stop (CUCM-PUB)
3. utils dbreplication reset all (CUCM-PUB)

but it shows Replication Not Setup

admin:utils dbreplication runtimestate

Server Time: Mon Jun 8 18:34:56 HKT 2015

Cluster Replication State: dbmonpreflightcheck file is not found.

DB Version: ccm10_5_1_10000_7
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from CUCM2015b (2 Servers):

PING DB/RPC/ REPL. Replication REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) DbMon? QUEUE Group ID (RTMT) & DB Status
----------- ---------- ------ ------- ----- ----------- ------------------
CUCM2015b 10.10.242.52 0.017 Y/Y/Y -- (-) (-) Replication Not Setup
cucm2015 10.10.242.51 0.096 Y/Y/Y -- (-) (-) Replication Not Setup

Anybody knows what mistake I have made? how can I make the DB replication to normal state?

Thanks in advance

Sam

Aman Soi · ‎06-08-2015

Hi Sam,

the output shows dbmonpreflightcheck file is not found which could be related to monitoring status of DB Replication.

Can u try the commands and then, check status

1. Issue the command "utils dbreplication stop" starting with the SUB's (all Subscribers) and then on the PUB. Wait for it to finish on one node before moving to the next.

2. Issue the command "utils dbreplication dropadmindb" starting with the SUB's (all Subscribers) and then on the PUB. Wait for it to finish on one node before moving to the next.

3. Issue the command "utils dbreplication reset all" on the PUB.

regds,

aman

samhopealpha · ‎06-08-2015

seems it doesn't work

and the system returns:

Use CLI to see detail: 'file view activelog cm/trace/db1/sdi/db1_repl_output_util.log'

It looks like some error/warning in the log

here is the log

Tue Jun 9 10:10:57 2015 replutil DEBUG: -->
Tue Jun 9 10:10:57 2015 replutil DEBUG: task to do [teardowndrf]
Tue Jun 9 10:10:57 2015 replutil DEBUG: Inside task == teardowndrf
Tue Jun 9 10:10:57 2015 replutil DEBUG: Hostname is None
Tue Jun 9 10:10:57 2015 replutil.getList DEBUG: -->
Tue Jun 9 10:10:57 2015 replutil.getList DEBUG: Inside getList
[1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 161 - 180 of 244) : [0m
Tue Jun 9 10:10:57 2015 replutil.getList DEBUG: myhost is [cucm2015]
Tue Jun 9 10:11:03 2015 replutil.getList DEBUG: groupname [g_2_ccm10_5_1_10000_7]
Tue Jun 9 10:11:03 2015 replutil.getList DEBUG: publisher [g_2_ccm10_5_1_10000_7]
Tue Jun 9 10:11:03 2015 replutil.getList DEBUG: groupname [g_3_ccm10_5_1_10000_7]
Tue Jun 9 10:11:03 2015 replutil.getList DEBUG: All subscribers are reachable, teardowndrf will continue
Tue Jun 9 10:11:03 2015 replutil.getList DEBUG: <--
Tue Jun 9 10:11:03 2015 replutil DEBUG: Starting replication reset all
Tue Jun 9 10:11:04 2015 replutil.cdrDelete DEBUG: -->
Tue Jun 9 10:11:04 2015 replutil.cdrDelete DEBUG:
Inside cdrDelete()
Tue Jun 9 10:11:04 2015 replutil.cdrDelete DEBUG: length of groupnameslist is [1]
Tue Jun 9 10:11:04 2015 dbllib.cdrdeleteserver DEBUG: -->
Tue Jun 9 10:11:10 2015 dbllib.cdrdeleteserver DEBUG: Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"
Tue Jun 9 10:11:10 2015 dbllib.cdrdeleteserver WARNING: Failed to execute [su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"], rc[5], msg[connect to g_3_ccm10_5_1_10000_7 failed
Server g_3_ccm10_5_1_10000_7 is not listed as a dbserver name in sqlhosts.
(-25555)
command failed -- unable to connect to server specified (5)], retrying...
Tue Jun 9 10:11:10 2015 dbllib.cdrdeleteserver DEBUG: Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"
Tue Jun 9 10:11:10 2015 dbllib.cdrdeleteserver ERROR: Failed to execute [su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"], rc[5], msg[connect to g_3_ccm10_5_1_10000_7 failed
Server g_3_ccm10_5_1_10000_7 is not listed as a dbserver name in sqlhosts.
[1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 181 - 200 of 244) : [0m
(-25555)
command failed -- unable to connect to server specified (5)]
Tue Jun 9 10:11:15 2015 dbllib.cdrdeleteserver DEBUG: groupname: [g_3_ccm10_5_1_10000_7] hostname: [CUCM2015b]
Tue Jun 9 10:11:16 2015 dbllib.cdrdeleteserver DEBUG: Dropadmindb executed successfully
Tue Jun 9 10:11:16 2015 dbllib.cdrdeleteserver DEBUG: Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f g_3_ccm10_5_1_10000_7"
Tue Jun 9 10:11:16 2015 dbllib.cdrdeleteserver WARNING: Ignorable error, ER is not active locally, rc[62], msg[The syscdr database is missing!
, sqlcode=-329
ISAM error -111:
command failed -- Enterprise Replication not active (62)]
Tue Jun 9 10:11:16 2015 dbllib.cdrdeleteserver DEBUG: <--
Tue Jun 9 10:11:16 2015 replutil.cdrDelete DEBUG: Successfully deleted the g_3_ccm10_5_1_10000_7 server from replication network
Tue Jun 9 10:12:16 2015 replutil.cdrDelete DEBUG: isDrf is not None
Tue Jun 9 10:12:16 2015 dbllib.cdrdeleteserver DEBUG: -->
Tue Jun 9 10:12:25 2015 dbllib.cdrdeleteserver DEBUG: Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f g_2_ccm10_5_1_10000_7"
Tue Jun 9 10:12:25 2015 dbllib.cdrdeleteserver WARNING: Ignorable error, ER is not active locally, rc[62], msg[The syscdr database is missing!
, sqlcode=-329
ISAM error -111:
command failed -- Enterprise Replication not active (62)]
Tue Jun 9 10:12:25 2015 dbllib.cdrdeleteserver DEBUG: <--
Tue Jun 9 10:12:25 2015 replutil.cdrDelete DEBUG: Successfully deleted the g_2_ccm10_5_1_10000_7 server from replication network
[1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 201 - 220 of 244) : [0m
Tue Jun 9 10:12:25 2015 replutil.cdrDelete DEBUG: <--
Tue Jun 9 10:12:25 2015 replutil DEBUG: <--
Tue Jun 9 10:12:25 2015 replutil DEBUG: -->
Tue Jun 9 10:12:25 2015 replutil DEBUG: task to do [setup]
Tue Jun 9 10:12:25 2015 replutil DEBUG: Inside task == setup

Tue Jun 9 10:12:25 2015 replutil.getList DEBUG: -->
Tue Jun 9 10:12:25 2015 replutil.getList DEBUG: Inside getList
Tue Jun 9 10:12:25 2015 replutil.getList DEBUG: myhost is [cucm2015]
Tue Jun 9 10:12:31 2015 replutil.getList DEBUG: groupname [g_2_ccm10_5_1_10000_7]
Tue Jun 9 10:12:31 2015 replutil.getList DEBUG: publisher [g_2_ccm10_5_1_10000_7]
Tue Jun 9 10:12:31 2015 replutil.getList DEBUG: groupname [g_3_ccm10_5_1_10000_7]
Tue Jun 9 10:12:31 2015 replutil.getList DEBUG: All subscribers are reachable, setup will continue
Tue Jun 9 10:12:31 2015 replutil.getList DEBUG: <--
Tue Jun 9 10:12:31 2015 replutil.cdrDefine DEBUG: -->
Tue Jun 9 10:12:31 2015 replutil.cdrDefine DEBUG: Inside cdrDefine method
Tue Jun 9 10:12:31 2015 replutil.cdrDefine DEBUG: Inside cdrDefine
Tue Jun 9 10:12:31 2015 replutil.cdrDefine DEBUG: val is g_3_ccm10_5_1_10000_7
Tue Jun 9 10:12:31 2015 replutil.cdrDefine DEBUG: cmd is [/usr/local/cm/bin/dbl mkrepl --delsub g_3_ccm10_5_1_10000_7]
Tue Jun 9 10:12:37 2015 replutil.cdrDefine DEBUG: <--
[1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 221 - 240 of 244) : [0m
Tue Jun 9 10:12:37 2015 replutil DEBUG: Reset command completed successfully on:
Tue Jun 9 10:12:37 2015 replutil DEBUG: CUCM2015b
Tue Jun 9 10:12:37 2015 replutil DEBUG: Reset completed: 1 Failed: 0
Tue Jun 9 10:12:37 2015 replutil DEBUG: <--

end of the file reached

Thanks in advance

Sam

Aman Soi · ‎06-08-2015

Sam,

suggest opening TAC case.

regds,

aman

Jitender Bhandari · ‎06-09-2015

Sam,

Can you provide the output of the commands below.

"show network cluster"

"Run sql select * from processnode"

"show tech network hosts"

JB

samhopealpha · ‎06-09-2015

Hi JB,

Thanks for the help, here are the output

admin:utils dbreplication runtimestate

Server Time: Wed Jun 10 11:45:11 HKT 2015

Cluster Replication State: dbmonpreflightcheck file is not found.

DB Version: ccm10_5_1_10000_7
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from CUCM2015b (2 Servers):

PING DB/RPC/ REPL. Replication REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) DbMon? QUEUE Group ID (RTMT) & DB Status
----------- ---------- ------ ------- ----- ----------- ------------------
CUCM2015b 10.10.242.52 0.015 Y/Y/Y -- (-) (-) Replication Not Setup
cucm2015 10.10.242.51 0.142 Y/Y/Y -- (-) (-) Replication Not Setup

admin:show network cluster
10.10.242.52 CUCM2015b Subscriber callmanager DBSub authenticated
10.10.242.51 cucm2015 Publisher callmanager DBPub authenticated using TCP since Wed Jun 10 10:56:43 2015

admin:run sql select * from processnode
pkid name mac systemnode description isactive nodeid tknodeusage ipv6name fklbmhubgroup tkprocessnoderole tkssomode
==================================== ================== === ========== ============== ======== ====== =========== ======== ============= ================= =========
00000000-1111-0000-0000-000000000000 EnterpriseWideData t t 1 1 NULL 1 0
9db7b992-639a-4aac-b80d-be65c39fbdbe 10.10.242.51 f t 2 0 NULL 1 0
8814ca40-0a11-502c-462e-0965c5446e30 10.10.242.52 f cucm2015b t 3 1 NULL 1 0
dabae774-46df-bd14-360c-f8ea3bbd9740 10.10.242.54 f domain lab.com t 6 0 NULL 2 0

admin:show tech nework hosts
-------------------- show platform network --------------------

/etc/hosts File:
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed from the cluster.

127.0.0.1 localhost
::1 localhost
10.10.242.52 CUCM2015b
10.10.242.51 cucm2015

Any Ideas?

Sam

Jitender Bhandari · ‎06-10-2015

Hi Sam,

Can you provide me output for below

"utils diagnose test"

"utils ntp status"

JB

samhopealpha · ‎06-10-2015

Hi JB,

Here is the output

admin:utils diagnose test

Log file: platform/log/diag2.log

Starting diagnostic test(s)

===========================

test - disk_space : Passed (available: 1814 MB, used: 12341 MB)

skip - disk_files : This module must be run directly and off hours

test - service_manager : Passed

test - tomcat : Passed

test - tomcat_deadlocks : Passed

test - tomcat_keystore : Passed

test - tomcat_connectors : Passed

test - tomcat_threads : Passed

test - tomcat_memory : Passed

test - tomcat_sessions : Passed

skip - tomcat_heapdump : This module must be run directly and off hours

test - validate_network : Reverse DNS lookup failed

test - raid : Passed

test - system_info : Passed (Collected system information in diagnostic log)

test - ntp_reachability : Passed

test - ntp_clock_drift : Passed

test - ntp_stratum : Passed

skip - sdl_fragmentation : This module must be run directly and off hours

skip - sdi_fragmentation : This module must be run directly and off hours

Diagnostics Completed

The final output will be in Log file: platform/log/diag2.log

Please use 'file view activelog platform/log/diag2.log' command to see the output

admin:utils ntp status

ntpd (pid 8019) is running...

remote refid st t when poll reach delay offset jitter

==============================================================================

*10.10.242.51 10.10.242.29 2 u 251 1024 377 0.918 5.578 0.469

synchronised to NTP server (10.10.242.51) at stratum 3

time correct to within 39 ms

polling server every 1024 s

Current time in UTC is : Thu Jun 11 06:24:29 UTC 2015

Current time in Asia/Hong_Kong is : Thu Jun 11 14:24:29 HKT 2015

Thanks for the help

Sam

Jitender Bhandari · ‎06-11-2015

Hi Sam,

Through the below commands i wanted to verify cluster integrity and how the host files are looking like.

"show network cluster"

"Run sql select * from processnode"

"show tech network hosts"

Below commands confirm no NTP issue which can cause dereplication issue.

"utils diagnose test"

"utils ntp status"

At the this i would like you to try below command is it written and see if it makes any difference, you might have tried this before. If below does not help we will have to get in root to verify configuration, so i would request you to contact TAC.

(1) Stopped the replication on all the nodes by running the command "utils dbreplication stop" (1st on sub's wait for admin prompt to return and then on pub)

(2) Ran the command "utils dbreplication dropadmindb" on all the nodes (1st the pub wait for admin prompt to return and then the sub's)

(3) Ran the command "utils dbreplication reset all" on the publisher.

JB

"show network cluster"

"Run sql select * from processnode"

"show tech network hosts"

- See more at: https://supportforums.cisco.com/discussion/12527681/re-host-cucm-publisher-db-replication-problem#sthash.CXkicyhe.dpuf

Juan Gerardo Hernandez · ‎02-26-2017

His helped me a lot, I changed my CUCM Cluster IP addresses and hostnames following this guide.

https://networkingnerd.net/2010/12/29/changing-callmanagers-ip-address/

after that I had a lot of DP replication issues, basically Pub didn't wanted to sync with sub. and the cluster threw a lot of strange errors.

something about informix stuff and hostname not found.

I tried a lot of db repair, cluster reset, restart, stop, none of that worked.

This made the trick, now all my node are in 2.

BTW it took a while to sync like 45 minutes. It is important ntp server is sync in all the nodes.

Juan.

Jitender Bhandari · ‎02-26-2017

Hi Juan,

Its an old post but happy that it helped you resolve the issue, and NTP is very important, if your NTP is not in sync you would bound to see issue with DB replication.

JB

Roberto Alvarez Perez · ‎07-04-2018

Thanks!

Stefan Pashov · ‎11-25-2019

Hi, I use these steps(1-3) to fix issues with DB replication in a Call Manager cluster of three servers ver. 12.5.1.11900-146. Thanks

Marcelo Morais · ‎06-11-2015

Hi Sam,

beyond what Jitender already said bellow, I would like to understand the steps that you use for the re-host.

You said that you did the following:

1. DRS backup CUCM PUB (on OLD UCS)
2. DRS restore CUCM PUB (on NEW UCS) (on a separated network)
3. Remove the CUCM-PUB(OLD) from the network
4. Plug the CUCM-PUB(NEW) to the network

Why you didn't do the following?

1. DRS backup CUCM PUB (on OLD UCS)
2. Shutdown the CUCM PUB (on OLD UCS)
3. Install and Restore the CUCM PUB (on NEW UCS) (in the same network)
4. Restart CUCM PUB (on NEW UCS)
5. Restart CUCM SUB (for database replication)

Hope this helps.

samhopealpha · ‎06-09-2015

Or is there any official documented for re-hosting procedures about the CUCM Publisher?

Re-host CUCM publisher, DB replication problem