CUCM DB Replication Troubleshoot – Things to Verify
CUCM uses Informix as its database and its copy is spread across the nodes in a full mesh topology. Understanding the concepts and diagnosing the issue is a bit of a challenge. This document is an attempt to bring all the diagnostic commands, verification, and troubleshooting steps related to DB in one place.
Contents
Verifying DB replication from Admin CLI
Verifying DB replication using RTMT
Verifying DB replication from Cisco Unified Reporting
Logs To Collect For Deep Dive Analysis
Verifying DB replication from Admin CLI
Utils dbreplication runtimestate
Ping:
RTT between the servers. This value has be to within 80ms for the network to be in compliance as per SRND.
DB/RPC/DBMon
DB = A Cisco DB
RPC = A Cisco DB Replicator – Remote Procedure call for database service
DbMon = Cisco Database Layer Monitor
Make sure entries for the above parameters are in the “Y” state for all the nodes.
Replication Queue
The value represents the amount of data that is in the queue for a particular node, verify if any of the nodes is having an abnormal value compared to other nodes
Verifying the replication status of the node using Perf Query Command
show perf query class "Number of Replicates Created and State of Replication"
The above command must be issued to all the subject nodes as it displays the result of one node at a time.
Verifying the process node entries are populated correctly
run sql select * from processnode
Running Diagnose test on the system
Utils diagnose test
Verifying the network configuration of the system
Show network eth0 all
Verify the DNS resolution for the subject hosts are working
Utils network host <FQDN/hostname>
Verify the cluster communication is in good shape
Show network cluster
Make sure the DB replication port is open and in listen state
Show open ports regexp 1515
Make sure the critical services are up and running
Utils service list
Database Replication Setup States
0
|
Initialization State
|
This state indicates that replication is in the process of trying to set up. Being in this state for a period longer than an hour could indicate a failure in setup.
|
1
|
Number of Replicates is not correct
|
This state is rarely seen in 6.x and 7.x but in 5. x can indicate it's still in the setup process. Being in this state for a period longer than an hour could indicate a failure in setup.
|
2
|
Replication is good
|
Logical connections have been established and tables match the other servers on the cluster.
|
3
|
Tables are suspect
|
Logical connections have been established but we are unsure if the tables match.
In 6.x and 7.x all servers could show state 3 if one server is down in the cluster.
This can happen because the other servers are unsure if there is an update to a user-facing feature that has not been passed from that sub to the other device in the cluster.
|
4
|
Setup Failed / Dropped
|
The server no longer has an active logical connection to receive database table across. No replication is occurring in this state.
|
Database Replication Stages
If you initiate the database replication rebuild command or if the system triggered an automatic reset, it will go through the below-mentioned stages to settle at setup completed.
1> Not Requested
2> waiting
3> Defining
4> Defined
5> Realizing
6> Syncing
7> Setup Completed
Verifying DB replication using RTMT Tool
Open RTMT with CUCM Publisher IP address
Go to Voice/Video > Service > Database Summary
The replication status for all the nodes is displayed in the bottom right corner
Verifying DB replication from Cisco Unified Reporting
Access the CUCM administration page and select Cisco Unified Reporting from the drop-down menu found in the top right corner.
Select System Reports and then click on Unified CM Database Summary from the left navigation pane
Make sure the highlighted data points are in good shape.
Log Analysis
file list activelog cm/trace/dbl date detail
Analyze the latest 3 files
- CDR define log
- CDR broadcast log
- Replication output broadcast log
Logs To Collect For Deep Dive Analysis
- utils create report database
- file get activelog cm/trace/dbl/*
- file get activelog cm/trace/dbl/sdi/dbmon*
- file get activelog cm/log/informix
- file get activelog cm/trace/dbl/sdi/startrpc.log
- file get activelog cm/trace/dbl/sdi/replication_scripts_output.log
- file get activelog cm/trace/dbl/sdi/start.log
- file get activelog cm/trace/dbl/sdi/runtimestate_*
- file get activelog syslog/*
- file get install system-history.log
- file get activelog cm/trace/dbl/sdi/ReplicationStatus*
- file get activelog cm/trace/dbl/sdi/DropAdminDB*
- file get activelog cm/trace/dbl/sdi/forcedatasyncsub*
- file get activelog cm/trace/dbl/sdi/DropAdminDB*
- file get activelog cm/trace/dbl/sdi/rebuild
- file get activelog cm/trace/dbl/sdi/dblrpc.err
- file get activelog cm/trace/dbl/sdi/dbl_repl_output_util.log