cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2427
Views
0
Helpful
6
Replies

CUC Cluster not functioning correctly

Brandon Pierce
Level 4
Level 4

Saw a few posts and documents relating to this issue but they don't match up perfectly with my particular scenario.  Basically, the customer could no longer log into the Unity Pub so we had to do a rebuild as nothing was working.  The sub took over as it should, a co-worker rebuilt their Pub and the split-brained effect never went away.  In, fact they arn't communicating at all almost a week later.  Here are the things I've checked so far:

 

  1. DB Replication: (from the subs perspective)
    1.  

      DB and Replication Services: ALL RUNNING

      Cluster Replication State: Only available on the PUB

      DB Version: ccm9_1_1_10000_11
      Repltimeout set to: 300s
      PROCESS option set to: 1

      Cluster Detailed View from XXXXX-UCXN02 (2 Servers):

                                      PING            CDR Server      REPL.   DBver&  REPL.   REPLICATION SETUP
      SERVER-NAME     IP ADDRESS      (msec)  RPC?    (ID) & STATUS   QUEUE   TABLES  LOOP?   (RTMT)
      -----------     ------------    ------  ----    --------------  -----   ------- -----   -----------------
      XXXXX-UCXN01    10.200.9.21     0.575   Yes     (2)  Connected   0      match   Yes     (2)
      XXXXX-UCXN02    10.103.9.22     0.067   Yes     (3)  Connected   0      match   Yes     (2)

    2.  

      (Pubs perspective)

       

       

      DB and Replication Services: ALL RUNNING

      DB CLI Status: No other dbreplication CLI is running...

      Cluster Replication State: BROADCAST SYNC Completed on 1 servers at: 2015-01-23-17-19
           Last Sync Result: SYNC COMPLETED  603 tables sync'ed out of 603
           Sync Errors: NO ERRORS

      DB Version: ccm9_1_1_10000_11
      Repltimeout set to: 300s
      PROCESS option set to: 1

      Cluster Detailed View from XXXXX-UCXN01 (2 Servers):

                                      PING            CDR Server      REPL.   DBver&  REPL.   REPLICATION SETUP
      SERVER-NAME     IP ADDRESS      (msec)  RPC?    (ID) & STATUS   QUEUE   TABLES  LOOP?   (RTMT) & details
      -----------     ------------    ------  ----    --------------  -----   ------- -----   -----------------
      XXXXX-UCXN01    10.200.9.21     0.084   Yes     (2)  Connected   0      match   Yes     (2) PUB Setup Completed
      XXXXX-UCXN02    10.103.9.22     0.663   Yes     (3)  Connected   0      match   Yes     (2) Setup Completed

  2.  

    Clusters look good:

     

     

    admin:show network cluster
    10.200.9.21 xxxxx-ucxn01.xxxxx.local xxxxx-ucxn01 Publisher authenticated
    10.103.9.22 xxxxx-ucxn02.xxxxx.local xxxxx-ucxn02 Subscriber authenticated using TCP since Fri Jan 23 16:42:15 2015

    Server Table (processnode) Entries
    ----------------------------------
    10.200.9.21
    10.103.9.22

    Successful

 

Overall, they are in that split-brained mode and working with CUCM but I'm not sure why it hasn't corrected itself.  Both the pub and sub have been restarted to no effect....  Any ideas on why this is still happening?  I am in the process of pulling logs.

 

Error shown by CUC at the Admin page after login:

 

  Communication is not functioning correctly between the servers in the Cisco Unity Connection cluster. To review server status for the cluster, go to the Tools > Cluster Management page of Cisco Unity Connection Serviceability.

 

6 Replies 6

mightyking
Level 6
Level 6

Hi,

Try to make your Sub primary and then switch back to your Pub as primary uning Cisco Unity Connection Serviceability . By doing so, all the required services will restart and synch.

 

Thanks,

 

MK

I'll give it a shot tomorrow.  Since they aren't even communicating with each other I figured that wasn't going to solve anything but at this point anything is fair game. A complete reboot should have also solved this though if that was the case.

This ended up not working since the button was greyed out as they can't see each other yet in the CLI DB replication is at (2)....  Co-worker ended up building out a new VM for the Pub and Sub using DRS, will see how far he got.  This was my problem but ended up getting slammed with about 3 other things that were more important.

Anirudh Mavilakandy
Cisco Employee
Cisco Employee

Check NTP. Ensure Unity is synced to a stable good stratum source - preferably stratum 1,2 or 3.

On your version, time slips can cause memory leaks on servm. This in turn affects cluster communication.

You said you couldn't access Pub. Pub would have been on high CPU. Another symptom of this issue.

 

You can confirm by looking at the core dumps - 'utils core active list'

See if there are any servm core dumps. Most likely the server is affected by CSCug53756 / CSCud58000

 

HTH

Anirudh

Both systems could be accessed without issue, apparently miscommunication was part of the problem but they just didnt see each other. NTP was fine.  The engineer has already rebuilt them so the issue is gone.  More investigation would have been nice but the customer wanted it solved immediately so the quickest resolution was DRS.

Ian Morris
Level 1
Level 1
I have this exact same issue did has anyone else encountered this and found a fix?