01-26-2015 09:37 AM - edited 03-19-2019 09:05 AM
Saw a few posts and documents relating to this issue but they don't match up perfectly with my particular scenario. Basically, the customer could no longer log into the Unity Pub so we had to do a rebuild as nothing was working. The sub took over as it should, a co-worker rebuilt their Pub and the split-brained effect never went away. In, fact they arn't communicating at all almost a week later. Here are the things I've checked so far:
DB and Replication Services: ALL RUNNING
Cluster Replication State: Only available on the PUB
DB Version: ccm9_1_1_10000_11
Repltimeout set to: 300s
PROCESS option set to: 1
Cluster Detailed View from XXXXX-UCXN02 (2 Servers):
PING CDR Server REPL. DBver& REPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? (ID) & STATUS QUEUE TABLES LOOP? (RTMT)
----------- ------------ ------ ---- -------------- ----- ------- ----- -----------------
XXXXX-UCXN01 10.200.9.21 0.575 Yes (2) Connected 0 match Yes (2)
XXXXX-UCXN02 10.103.9.22 0.067 Yes (3) Connected 0 match Yes (2)
(Pubs perspective)
DB and Replication Services: ALL RUNNING
DB CLI Status: No other dbreplication CLI is running...
Cluster Replication State: BROADCAST SYNC Completed on 1 servers at: 2015-01-23-17-19
Last Sync Result: SYNC COMPLETED 603 tables sync'ed out of 603
Sync Errors: NO ERRORS
DB Version: ccm9_1_1_10000_11
Repltimeout set to: 300s
PROCESS option set to: 1
Cluster Detailed View from XXXXX-UCXN01 (2 Servers):
PING CDR Server REPL. DBver& REPL. REPLICATION SETUP
SERVER-NAME IP ADDRESS (msec) RPC? (ID) & STATUS QUEUE TABLES LOOP? (RTMT) & details
----------- ------------ ------ ---- -------------- ----- ------- ----- -----------------
XXXXX-UCXN01 10.200.9.21 0.084 Yes (2) Connected 0 match Yes (2) PUB Setup Completed
XXXXX-UCXN02 10.103.9.22 0.663 Yes (3) Connected 0 match Yes (2) Setup Completed
Clusters look good:
admin:show network cluster
10.200.9.21 xxxxx-ucxn01.xxxxx.local xxxxx-ucxn01 Publisher authenticated
10.103.9.22 xxxxx-ucxn02.xxxxx.local xxxxx-ucxn02 Subscriber authenticated using TCP since Fri Jan 23 16:42:15 2015
Server Table (processnode) Entries
----------------------------------
10.200.9.21
10.103.9.22
Successful
Overall, they are in that split-brained mode and working with CUCM but I'm not sure why it hasn't corrected itself. Both the pub and sub have been restarted to no effect.... Any ideas on why this is still happening? I am in the process of pulling logs.
Error shown by CUC at the Admin page after login:
Communication is not functioning correctly between the servers in the Cisco Unity Connection cluster. To review server status for the cluster, go to the Tools > Cluster Management page of Cisco Unity Connection Serviceability.
01-26-2015 12:28 PM
Hi,
Try to make your Sub primary and then switch back to your Pub as primary uning Cisco Unity Connection Serviceability . By doing so, all the required services will restart and synch.
Thanks,
MK
01-26-2015 07:38 PM
I'll give it a shot tomorrow. Since they aren't even communicating with each other I figured that wasn't going to solve anything but at this point anything is fair game. A complete reboot should have also solved this though if that was the case.
01-27-2015 08:48 PM
This ended up not working since the button was greyed out as they can't see each other yet in the CLI DB replication is at (2).... Co-worker ended up building out a new VM for the Pub and Sub using DRS, will see how far he got. This was my problem but ended up getting slammed with about 3 other things that were more important.
01-28-2015 10:54 PM
Check NTP. Ensure Unity is synced to a stable good stratum source - preferably stratum 1,2 or 3.
On your version, time slips can cause memory leaks on servm. This in turn affects cluster communication.
You said you couldn't access Pub. Pub would have been on high CPU. Another symptom of this issue.
You can confirm by looking at the core dumps - 'utils core active list'
See if there are any servm core dumps. Most likely the server is affected by CSCug53756 / CSCud58000
HTH
Anirudh
01-29-2015 01:33 PM
Both systems could be accessed without issue, apparently miscommunication was part of the problem but they just didnt see each other. NTP was fine. The engineer has already rebuilt them so the issue is gone. More investigation would have been nice but the customer wanted it solved immediately so the quickest resolution was DRS.
01-13-2016 03:38 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide