cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8799
Views
25
Helpful
15
Replies
Highlighted
Beginner

Re-host CUCM publisher, DB replication problem

Hi Everybody, 


Here is the environment
CUCM 10.5 - PUB 10.10.242.51
CUCM 10.5 - SUB 10.10.242.52

I am going to re-host the CUCM pub, the following is my procedure

1. DRS backup CUCM PUB (on OLD UCS) 
2. DRS restore CUCM PUB (on NEW UCS) (on a separated network)
3. Remove the CUCM-PUB(OLD) from the network
4. Plug the CUCM-PUB(NEW) to the network

when type the "utils dbreplication runtimestate
it shows Pub is DB Active-Dropped

admin:utils dbreplication runtimestate 

Server Time: Mon Jun  8 17:25:59 HKT 2015

Cluster Replication State: Only available on the PUB


DB Version: ccm10_5_1_10000_7
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from CUCM2015b (2 Servers):

                                      PING      DB/RPC/   REPL.    Replication    REPLICATION SETUP
SERVER-NAME         IP ADDRESS        (msec)    DbMon?    QUEUE    Group ID       (RTMT) & DB Status
-----------         ----------        ------    -------   -----    -----------    ------------------
CUCM2015b           10.10.242.52      0.061     Y/Y/Y     0        (g_3)          (2) Setup Completed
cucm2015            10.10.242.51      0.100     Y/Y/Y     14926    (g_2)          (-) DB Active-Dropped

 
then I tried 
1. utils dbreplication stop (CUCM-SUB)
2. utils dbreplication stop (CUCM-PUB)
3. utils dbreplication reset all (CUCM-PUB)

but it shows Replication Not Setup

admin:utils dbreplication runtimestate 

Server Time: Mon Jun  8 18:34:56 HKT 2015

Cluster Replication State: dbmonpreflightcheck file is not found.

DB Version: ccm10_5_1_10000_7
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from CUCM2015b (2 Servers):

                                      PING      DB/RPC/   REPL.    Replication    REPLICATION SETUP
SERVER-NAME         IP ADDRESS        (msec)    DbMon?    QUEUE    Group ID       (RTMT) & DB Status
-----------         ----------        ------    -------   -----    -----------    ------------------
CUCM2015b           10.10.242.52      0.017     Y/Y/Y     --       (-)            (-) Replication Not Setup
cucm2015            10.10.242.51      0.096     Y/Y/Y     --       (-)            (-) Replication Not Setup

 

 

Anybody knows what mistake I have made? how can I make the DB replication to normal state?

Thanks in advance

Sam
 

 

 

15 REPLIES 15
Highlighted
Advisor

Hi Sam,

the output shows dbmonpreflightcheck file is not found which could be related to monitoring status of DB Replication.

Can  u try the commands  and then, check status

 

1. Issue the command "utils dbreplication stop" starting with the SUB's (all Subscribers) and then on the PUB. Wait for it to finish on one node before moving to the next.

2. Issue the command "utils dbreplication dropadmindb" starting with the SUB's (all Subscribers) and then on the PUB. Wait for it to finish on one node before moving to the next.

3. Issue the command "utils dbreplication reset all" on the PUB.

 

 

regds,

aman

Highlighted

seems it doesn't work 

 

and the system returns: 

Use CLI to see detail: 'file view activelog cm/trace/db1/sdi/db1_repl_output_util.log'

It looks like some error/warning  in the log

here is the log

Tue Jun  9 10:10:57 2015 replutil  DEBUG:  -->
Tue Jun  9 10:10:57 2015 replutil  DEBUG:  task to do [teardowndrf] 
Tue Jun  9 10:10:57 2015 replutil  DEBUG:   Inside task == teardowndrf 
Tue Jun  9 10:10:57 2015 replutil  DEBUG:  Hostname is None 
Tue Jun  9 10:10:57 2015 replutil.getList  DEBUG:  -->
Tue Jun  9 10:10:57 2015 replutil.getList  DEBUG:  Inside getList
 [1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 161 - 180 of 244) :  [0m
Tue Jun  9 10:10:57 2015 replutil.getList  DEBUG:  myhost is [cucm2015]
Tue Jun  9 10:11:03 2015 replutil.getList  DEBUG:  groupname [g_2_ccm10_5_1_10000_7]
Tue Jun  9 10:11:03 2015 replutil.getList  DEBUG:  publisher [g_2_ccm10_5_1_10000_7]
Tue Jun  9 10:11:03 2015 replutil.getList  DEBUG:  groupname [g_3_ccm10_5_1_10000_7]
Tue Jun  9 10:11:03 2015 replutil.getList  DEBUG:  All subscribers are reachable, teardowndrf will continue
Tue Jun  9 10:11:03 2015 replutil.getList  DEBUG:  <--
Tue Jun  9 10:11:03 2015 replutil  DEBUG:  Starting replication reset all
Tue Jun  9 10:11:04 2015 replutil.cdrDelete  DEBUG:  -->
Tue Jun  9 10:11:04 2015 replutil.cdrDelete  DEBUG:  
Inside cdrDelete()
Tue Jun  9 10:11:04 2015 replutil.cdrDelete  DEBUG:  length of groupnameslist is [1] 
Tue Jun  9 10:11:04 2015 dbllib.cdrdeleteserver  DEBUG:  -->
Tue Jun  9 10:11:10 2015 dbllib.cdrdeleteserver  DEBUG:  Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"
Tue Jun  9 10:11:10 2015 dbllib.cdrdeleteserver  WARNING:  Failed to execute [su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"], rc[5], msg[connect to g_3_ccm10_5_1_10000_7 failed 
Server g_3_ccm10_5_1_10000_7 is not listed as a dbserver name in sqlhosts.
 (-25555)
command failed -- unable to connect to server specified  (5)], retrying...
Tue Jun  9 10:11:10 2015 dbllib.cdrdeleteserver  DEBUG:  Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"
Tue Jun  9 10:11:10 2015 dbllib.cdrdeleteserver  ERROR:  Failed to execute [su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f --connect=g_3_ccm10_5_1_10000_7 g_3_ccm10_5_1_10000_7"], rc[5], msg[connect to g_3_ccm10_5_1_10000_7 failed 
Server g_3_ccm10_5_1_10000_7 is not listed as a dbserver name in sqlhosts.
 [1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 181 - 200 of 244) :  [0m
 (-25555)
command failed -- unable to connect to server specified  (5)]
Tue Jun  9 10:11:15 2015 dbllib.cdrdeleteserver  DEBUG:  groupname: [g_3_ccm10_5_1_10000_7] hostname: [CUCM2015b]
Tue Jun  9 10:11:16 2015 dbllib.cdrdeleteserver  DEBUG:  Dropadmindb executed successfully
Tue Jun  9 10:11:16 2015 dbllib.cdrdeleteserver  DEBUG:  Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f g_3_ccm10_5_1_10000_7"
Tue Jun  9 10:11:16 2015 dbllib.cdrdeleteserver  WARNING:  Ignorable error, ER is not active locally, rc[62], msg[The syscdr database is missing!
, sqlcode=-329
ISAM error -111: 
command failed -- Enterprise Replication not active  (62)]
Tue Jun  9 10:11:16 2015 dbllib.cdrdeleteserver  DEBUG:  <--
Tue Jun  9 10:11:16 2015 replutil.cdrDelete  DEBUG:  Successfully deleted the g_3_ccm10_5_1_10000_7 server from replication network
Tue Jun  9 10:12:16 2015 replutil.cdrDelete  DEBUG:  isDrf is not None
Tue Jun  9 10:12:16 2015 dbllib.cdrdeleteserver  DEBUG:  -->
Tue Jun  9 10:12:25 2015 dbllib.cdrdeleteserver  DEBUG:  Executing su - informix -c "source /usr/local/cm/db/informix/local/ids.env; cdr delete server -f g_2_ccm10_5_1_10000_7"
Tue Jun  9 10:12:25 2015 dbllib.cdrdeleteserver  WARNING:  Ignorable error, ER is not active locally, rc[62], msg[The syscdr database is missing!
, sqlcode=-329
ISAM error -111: 
command failed -- Enterprise Replication not active  (62)]
Tue Jun  9 10:12:25 2015 dbllib.cdrdeleteserver  DEBUG:  <--
Tue Jun  9 10:12:25 2015 replutil.cdrDelete  DEBUG:  Successfully deleted the g_2_ccm10_5_1_10000_7 server from replication network
 [1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 201 - 220 of 244) :  [0m
Tue Jun  9 10:12:25 2015 replutil.cdrDelete  DEBUG:  <--
Tue Jun  9 10:12:25 2015 replutil  DEBUG:  <--
Tue Jun  9 10:12:25 2015 replutil  DEBUG:  -->
Tue Jun  9 10:12:25 2015 replutil  DEBUG:  task to do [setup] 
Tue Jun  9 10:12:25 2015 replutil  DEBUG:   Inside task == setup 

Tue Jun  9 10:12:25 2015 replutil.getList  DEBUG:  -->
Tue Jun  9 10:12:25 2015 replutil.getList  DEBUG:  Inside getList
Tue Jun  9 10:12:25 2015 replutil.getList  DEBUG:  myhost is [cucm2015]
Tue Jun  9 10:12:31 2015 replutil.getList  DEBUG:  groupname [g_2_ccm10_5_1_10000_7]
Tue Jun  9 10:12:31 2015 replutil.getList  DEBUG:  publisher [g_2_ccm10_5_1_10000_7]
Tue Jun  9 10:12:31 2015 replutil.getList  DEBUG:  groupname [g_3_ccm10_5_1_10000_7]
Tue Jun  9 10:12:31 2015 replutil.getList  DEBUG:  All subscribers are reachable, setup will continue
Tue Jun  9 10:12:31 2015 replutil.getList  DEBUG:  <--
Tue Jun  9 10:12:31 2015 replutil.cdrDefine  DEBUG:  -->
Tue Jun  9 10:12:31 2015 replutil.cdrDefine  DEBUG:  Inside cdrDefine method 
Tue Jun  9 10:12:31 2015 replutil.cdrDefine  DEBUG:  Inside cdrDefine
Tue Jun  9 10:12:31 2015 replutil.cdrDefine  DEBUG:  val is g_3_ccm10_5_1_10000_7
Tue Jun  9 10:12:31 2015 replutil.cdrDefine  DEBUG:  cmd is [/usr/local/cm/bin/dbl mkrepl --delsub g_3_ccm10_5_1_10000_7] 
Tue Jun  9 10:12:37 2015 replutil.cdrDefine  DEBUG:  <--
 [1m
options: q=quit, n=next, p=prev, b=begin, e=end (lines 221 - 240 of 244) :  [0m
Tue Jun  9 10:12:37 2015 replutil  DEBUG:  Reset command completed successfully on:
Tue Jun  9 10:12:37 2015 replutil  DEBUG:  CUCM2015b
Tue Jun  9 10:12:37 2015 replutil  DEBUG:  Reset completed: 1     Failed: 0
Tue Jun  9 10:12:37 2015 replutil  DEBUG:  <--

end of the file reached

 

Thanks in advance

 

Sam

Highlighted

Sam,

 

suggest opening TAC case.

 

regds,

aman

Highlighted

Sam,

 

Can you provide the output of the commands below.

 

"show network cluster"

"Run sql select * from processnode"

"show tech network hosts"

 

JB

Highlighted

Hi JB, 

Thanks for the help, here are the output 

 

admin:utils dbreplication runtimestate 

Server Time: Wed Jun 10 11:45:11 HKT 2015

Cluster Replication State: dbmonpreflightcheck file is not found.

DB Version: ccm10_5_1_10000_7
Repltimeout set to: 300s
PROCESS option set to: 1

Cluster Detailed View from CUCM2015b (2 Servers):

                                      PING      DB/RPC/   REPL.    Replication    REPLICATION SETUP
SERVER-NAME         IP ADDRESS        (msec)    DbMon?    QUEUE    Group ID       (RTMT) & DB Status
-----------         ----------        ------    -------   -----    -----------    ------------------
CUCM2015b           10.10.242.52      0.015     Y/Y/Y     --       (-)            (-) Replication Not Setup
cucm2015            10.10.242.51      0.142     Y/Y/Y     --       (-)            (-) Replication Not Setup

 


 
admin:show network cluster
10.10.242.52 CUCM2015b  Subscriber callmanager DBSub authenticated
10.10.242.51 cucm2015  Publisher callmanager DBPub authenticated using TCP since Wed Jun 10 10:56:43 2015

admin:run sql select * from processnode
pkid                                 name               mac systemnode description    isactive nodeid tknodeusage ipv6name fklbmhubgroup tkprocessnoderole tkssomode 
==================================== ================== === ========== ============== ======== ====== =========== ======== ============= ================= ========= 
00000000-1111-0000-0000-000000000000 EnterpriseWideData     t                         t        1      1                    NULL          1                 0         
9db7b992-639a-4aac-b80d-be65c39fbdbe 10.10.242.51           f                         t        2      0                    NULL          1                 0         
8814ca40-0a11-502c-462e-0965c5446e30 10.10.242.52           f          cucm2015b      t        3      1                    NULL          1                 0         
dabae774-46df-bd14-360c-f8ea3bbd9740 10.10.242.54           f          domain lab.com t        6      0                    NULL          2                 0         

admin:show tech nework hosts
 -------------------- show platform network -------------------- 

 /etc/hosts File: 
#This file was generated by the /etc/hosts cluster manager.
#It is automatically updated as nodes are added, changed, removed from the cluster.

127.0.0.1 localhost
::1 localhost
10.10.242.52  CUCM2015b
10.10.242.51  cucm2015

 

Any Ideas?

Sam

 

Highlighted

Hi Sam,

 

Can you provide me output for below

 

"utils diagnose test"

"utils ntp status"

 

JB

Highlighted

Hi JB, 

Here is the output 

 

admin:utils diagnose test
 
Log file: platform/log/diag2.log
 
Starting diagnostic test(s)
===========================
 
test - disk_space          : Passed (available: 1814 MB, used: 12341 MB)
skip - disk_files          : This module must be run directly and off hours
test - service_manager     : Passed                                                   
test - tomcat              : Passed      
test - tomcat_deadlocks    : Passed  
test - tomcat_keystore     : Passed  
test - tomcat_connectors   : Passed                                                   
test - tomcat_threads      : Passed      
test - tomcat_memory       : Passed      
test - tomcat_sessions     : Passed  
skip - tomcat_heapdump     : This module must be run directly and off hours
test - validate_network    : Reverse DNS lookup failed
test - raid                : Passed  
test - system_info         : Passed (Collected system information in diagnostic log)
test - ntp_reachability    : Passed
test - ntp_clock_drift     : Passed
test - ntp_stratum         : Passed
skip - sdl_fragmentation   : This module must be run directly and off hours
skip - sdi_fragmentation   : This module must be run directly and off hours
 
Diagnostics Completed
 
 
 The final output will be in Log file: platform/log/diag2.log
 
 
 Please use 'file view activelog platform/log/diag2.log' command to see the output
 
admin:utils ntp status
ntpd (pid 8019) is running...
 
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*10.10.242.51    10.10.242.29     2 u  251 1024  377    0.918    5.578   0.469
 
 
synchronised to NTP server (10.10.242.51) at stratum 3 
   time correct to within 39 ms
   polling server every 1024 s
 
Current time in UTC is : Thu Jun 11 06:24:29 UTC 2015
Current time in Asia/Hong_Kong is : Thu Jun 11 14:24:29 HKT 2015
 
Thanks for the help 
 
Sam
Highlighted

Hi Sam,

 

Through the below commands i wanted to verify cluster integrity and how the host files are looking like.

"show network cluster"

"Run sql select * from processnode"

"show tech network hosts"

 

Below commands confirm no NTP issue which can cause dereplication issue.

"utils diagnose test"

"utils ntp status"


At the this i would like you to try below command is it written and see if it makes any difference, you might have tried this before. If below does not help we will have to get in root to verify configuration, so i would request you to contact TAC.

 

(1) Stopped the replication on all the nodes by running the command "utils dbreplication stop" (1st on sub's wait for admin prompt to return and then on pub)


(2) Ran the command "utils dbreplication dropadmindb" on all the nodes (1st the pub wait for admin prompt to return and then the sub's)


(3) Ran the command "utils dbreplication reset all" on the publisher.

 

JB

 

"show network cluster"

"Run sql select * from processnode"

"show tech network hosts"

- See more at: https://supportforums.cisco.com/discussion/12527681/re-host-cucm-publisher-db-replication-problem#sthash.CXkicyhe.dpuf
Highlighted

His helped me a lot, I changed my CUCM Cluster IP addresses and hostnames following this guide.

https://networkingnerd.net/2010/12/29/changing-callmanagers-ip-address/

after that I had a lot of DP replication issues, basically Pub didn't wanted to sync with sub. and the cluster threw a lot of strange errors.

something about informix stuff and hostname not found.

I tried a lot of db repair, cluster reset, restart, stop, none of that worked.

This made the trick, now all my node are in 2.

BTW it took a while to sync like 45 minutes. It is important ntp server is sync in all the nodes.

Juan.

Highlighted

Hi Juan,

Its an old post but happy that it helped you resolve the issue, and NTP is very important, if your NTP is not in sync you would bound to see issue with DB replication.

JB

Highlighted

Thanks!

Highlighted

Hi, I use these steps(1-3) to fix issues with DB replication in a Call Manager cluster of three servers ver. 12.5.1.11900-146. Thanks
Highlighted

Hi Sam,

 beyond what Jitender already said bellow, I would like to understand the steps that you use for the re-host.

 You said that you did the following:

1. DRS backup CUCM PUB (on OLD UCS)
2. DRS restore CUCM PUB (on NEW UCS) (on a separated network)
3. Remove the CUCM-PUB(OLD) from the network
4. Plug the CUCM-PUB(NEW) to the network

Why you didn't do the following?

1. DRS backup CUCM PUB (on OLD UCS)
2. Shutdown the CUCM PUB (on OLD UCS)
3. Install and Restore the CUCM PUB (on NEW UCS) (in the same network)
4. Restart CUCM PUB (on NEW UCS)
5. Restart CUCM SUB (for database replication)

 

Hope this helps.

Highlighted
Beginner

Or is there any official documented for re-hosting procedures about the CUCM Publisher?

Content for Community-Ad