12-30-2015 07:23 AM - edited 03-19-2019 10:32 AM
Hello,
I have two Unity servers, one at each site that recently started displaying the message "Communication is not functioning correctly between the servers in the Cisco Unity Connection cluster."
Users recently began reporting MWI issues at the remote site as well, that site has a Unity Connection Subscriber as well as two Call Manager Subscribers.
Our main site has a CM Publisher and Subscriber as well as a Unity Connection Publisher.
I have tried restarting the Unity Subscriber and then the Publisher via Settings>Version in the Unified OS Administration but that didn't seem to resolve anything.
On the Subscriber, under Unity Connection Serviceability, it says that the server status of the Publisher is "Not Reachable". I have verified the devices have IP connectivity between them.
Any help is appreciated, I've recently been given the reigns of our VoIP system and I only know enough to be dangerous.
Solved! Go to Solution.
01-01-2016 04:49 AM
Hello there!
When you logged into the Unified Serviceability page of the HA node (subscriber) and you saw it say Not Reachable for the publisher, did you also notice if it said, "Split Brain Recovery" as the server status for the HA (subscriber) node? If so, please see http://ryanthomashuff.com/2015/12/cisco-unity-connections-split-brain/. Assuming that IS NOT the case, I have included the steps I would go through to troubleshoot.
If you could log into the CLI (via SSH with the platform administration credentials) of both server nodes; here are some of the tasks I would go through.
Check to see if the reachable issue is resolved and cluster operation is restored.
PASSED SAMPLE |
FAILED SAMPLE |
Check to see if anything you resolved from the failed network validation corrected the reachable issue and if cluster operation is restored.
Check to see if anything you resolved from the network delay and NTP resolved the reachable issue and if cluster operation is restored.
OK! I realize this may seem like a lot; however, since you stated that you checked the actual network communication between the node segments (which you said is working) and that you have already rebooted both servers (some of the first things I would recommend), that only leaves the business end of troubleshooting!
So, run through all this and hopefully something here will help shed a light on what is going on with the cluster.
Thanks,
Ryan
(: ... Please rate helpful posts ...:)
12-31-2015 06:54 AM
I'm going to assume when you validated ip connectivity you did it either from the OS Administration page of each Unity Connection server or from the CLI of each server.
from either the OS Administration page or the CLI did were you able to ping both host name and IP address?
Have you validated DNS and reverse DNS is working correctly?
From the CLI of each server run -- utils diagnose test
The utils diagnose test may give you more information to help resolve your issue.
***Don't forget to rate helpful feedback***
01-01-2016 04:49 AM
Hello there!
When you logged into the Unified Serviceability page of the HA node (subscriber) and you saw it say Not Reachable for the publisher, did you also notice if it said, "Split Brain Recovery" as the server status for the HA (subscriber) node? If so, please see http://ryanthomashuff.com/2015/12/cisco-unity-connections-split-brain/. Assuming that IS NOT the case, I have included the steps I would go through to troubleshoot.
If you could log into the CLI (via SSH with the platform administration credentials) of both server nodes; here are some of the tasks I would go through.
Check to see if the reachable issue is resolved and cluster operation is restored.
PASSED SAMPLE |
FAILED SAMPLE |
Check to see if anything you resolved from the failed network validation corrected the reachable issue and if cluster operation is restored.
Check to see if anything you resolved from the network delay and NTP resolved the reachable issue and if cluster operation is restored.
OK! I realize this may seem like a lot; however, since you stated that you checked the actual network communication between the node segments (which you said is working) and that you have already rebooted both servers (some of the first things I would recommend), that only leaves the business end of troubleshooting!
So, run through all this and hopefully something here will help shed a light on what is going on with the cluster.
Thanks,
Ryan
(: ... Please rate helpful posts ...:)
08-28-2019 05:54 AM
It seems to be unfortunately pervasive that people choose to provide links with no context. It's fine to link to an article, but considering 90% are broken this forum loses quite a bit of usefulness if what you're looking for is presented that way. Even more so when the URL isn't a Cisco URL; it's hard enough to get a working one, if it was before the makeover.
01-11-2016 12:48 PM
I'm having the same issue on a customers 11.0.1.21900-11 system. There is a bug CSCuu15688 which is close to what I am seeing but not exact. I've seen the issue happen twice on this cluster/environment where something caused Primary CUC server to think the Secondary CUC server went down. The Secondary came back and SBR wasn't able to recover. The first time I saw this it took a reboot of both servers before everything came back to normal. I'm waiting to decide about rebooting the servers again to see if it fixes the issue a second time.
The only issue the customer noticed is MWIs not in sync. They never mentioned about messages not being delivered.
Ryan - any input to this? Honestly it seems odd that you just wrote a document about this. :) I was hoping you worked at Cisco and had some insight to a potential known issue with SBR and version 11.
Thanks,
Dan
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide