Both systems could be

Brandon Pierce · ‎01-26-2015

Saw a few posts and documents relating to this issue but they don't match up perfectly with my particular scenario. Basically, the customer could no longer log into the Unity Pub so we had to do a rebuild as nothing was working. The sub took over as it should, a co-worker rebuilt their Pub and the split-brained effect never went away. In, fact they arn't communicating at all almost a week later. Here are the things I've checked so far:

DB Replication: (from the subs perspective)
1. DB and Replication Services: ALL RUNNING
  Cluster Replication State: Only available on the PUB
  DB Version: ccm9_1_1_10000_11
  Repltimeout set to: 300s
  PROCESS option set to: 1
  Cluster Detailed View from XXXXX-UCXN02 (2 Servers):
                                  PING            CDR Server      REPL.   DBver& REPL.   REPLICATION SETUP
  SERVER-NAME     IP ADDRESS      (msec) RPC?    (ID) & STATUS   QUEUE   TABLES LOOP?   (RTMT)
  -----------     ------------    ------ ----    -------------- -----   ------- -----   -----------------
  XXXXX-UCXN01    10.200.9.21     0.575   Yes     (2) Connected   0      match   Yes     (2)
  XXXXX-UCXN02    10.103.9.22     0.067   Yes     (3) Connected   0      match   Yes     (2)
2. (Pubs perspective)
  
  DB and Replication Services: ALL RUNNING
  DB CLI Status: No other dbreplication CLI is running...
  Cluster Replication State: BROADCAST SYNC Completed on 1 servers at: 2015-01-23-17-19
       Last Sync Result: SYNC COMPLETED 603 tables sync'ed out of 603
       Sync Errors: NO ERRORS
  DB Version: ccm9_1_1_10000_11
  Repltimeout set to: 300s
  PROCESS option set to: 1
  Cluster Detailed View from XXXXX-UCXN01 (2 Servers):
                                  PING            CDR Server      REPL.   DBver& REPL.   REPLICATION SETUP
  SERVER-NAME     IP ADDRESS      (msec) RPC?    (ID) & STATUS   QUEUE   TABLES LOOP?   (RTMT) & details
  -----------     ------------    ------ ----    -------------- -----   ------- -----   -----------------
  XXXXX-UCXN01    10.200.9.21     0.084   Yes     (2) Connected   0      match   Yes     (2) PUB Setup Completed
  XXXXX-UCXN02    10.103.9.22     0.663   Yes     (3) Connected   0      match   Yes     (2) Setup Completed
Clusters look good:

admin:show network cluster
10.200.9.21 xxxxx-ucxn01.xxxxx.local xxxxx-ucxn01 Publisher authenticated
10.103.9.22 xxxxx-ucxn02.xxxxx.local xxxxx-ucxn02 Subscriber authenticated using TCP since Fri Jan 23 16:42:15 2015
Server Table (processnode) Entries
----------------------------------
10.200.9.21
10.103.9.22
Successful

Overall, they are in that split-brained mode and working with CUCM but I'm not sure why it hasn't corrected itself. Both the pub and sub have been restarted to no effect.... Any ideas on why this is still happening? I am in the process of pulling logs.

Error shown by CUC at the Admin page after login:

Communication is not functioning correctly between the servers in the Cisco Unity Connection cluster. To review server status for the cluster, go to the Tools > Cluster Management page of Cisco Unity Connection Serviceability.

mightyking · ‎01-26-2015

Hi,

Try to make your Sub primary and then switch back to your Pub as primary uning Cisco Unity Connection Serviceability . By doing so, all the required services will restart and synch.

Thanks,

MK

Brandon Pierce · ‎01-26-2015

I'll give it a shot tomorrow. Since they aren't even communicating with each other I figured that wasn't going to solve anything but at this point anything is fair game. A complete reboot should have also solved this though if that was the case.

Brandon Pierce · ‎01-27-2015

This ended up not working since the button was greyed out as they can't see each other yet in the CLI DB replication is at (2).... Co-worker ended up building out a new VM for the Pub and Sub using DRS, will see how far he got. This was my problem but ended up getting slammed with about 3 other things that were more important.

Anirudh Mavilakandy · ‎01-28-2015

Check NTP. Ensure Unity is synced to a stable good stratum source - preferably stratum 1,2 or 3.

On your version, time slips can cause memory leaks on servm. This in turn affects cluster communication.

You said you couldn't access Pub. Pub would have been on high CPU. Another symptom of this issue.

You can confirm by looking at the core dumps - 'utils core active list'

See if there are any servm core dumps. Most likely the server is affected by CSCug53756 / CSCud58000

HTH

Anirudh

Brandon Pierce · ‎01-29-2015

Both systems could be accessed without issue, apparently miscommunication was part of the problem but they just didnt see each other. NTP was fine. The engineer has already rebuilt them so the issue is gone. More investigation would have been nice but the customer wanted it solved immediately so the quickest resolution was DRS.

Ian Morris · ‎01-13-2016

I have this exact same issue did has anyone else encountered this and found a fix?

CUC Cluster not functioning correctly