Re: Unity Primary fails over to Unity Secondary automatically

martin.schoonbroodt · ‎12-21-2010

Hi Team,

I've a strange situation over here. Our Unity cluster was working properly for more than 2 months and one week ago, the first server failed over the second Unity. Trying to failback the second server, we saw that the activation process of the first node in the Failover monitor. We also all ports coming upagain in the CUCM. Well, after the failback, everything seems to be fine again but it isn't. As soon as a user is calling in the Unity 1 even before leaving a message, we can see very clearly in the failover monitor that again, Unity 1 is switching to Unity 2.

Any ideas? TAC hasn't yet :-D

We are running in 7.0(2) Unified Messaging and a CUCM 7.1(5)... Not so bad right?

Best Regards from Luxembourg!

M.

David Hailey · ‎12-21-2010

I haven't seen that specifically but here are a few things I would look for and/or test in the short term:

1) Look at the Windows event logs on the Unity primary. Specifically, look for TSP errors or anything that seems random or even recurring way too often (and that started up around the time this first began). You may find some clues there as to what is going on. The application log is your best bet but doesn't hurt to check the system log.

2) I would schedule some after hours testing and test the Primary by itself. By that I mean that I would disable failover and let the Primary register it's ports to CUCM (if not already). Make sure the IP addresse for the ports in CUCM is correct (if you see an APIPA address then your quickest resolution is to reboot the box). If the ports are OK, then test calling into the Primary when it is the only system online. If it takes calls and behaves properly, then that will tell you the box itself is healthy.

3) Similar to #2, if all is OK - reconfigure Failover. Then manually failover to the Secondary and disable auto-failback. Then test the Secondary on it's own. If all is well, then that box is OK too.

4) Reboot the primary server, let it come back and ensure it is healthy. Then manually failback to the primary. Reboot the secondary server as you did the first. Make sure your failover/failback configurations are as you want them and then test the behavior calling into VM.

My point here is to independently test each server as well as the failover configuration. If each box operates OK on it's own but misbehaves when failover is configured then you may need to hone in on the failover configuration. For example, if you have the failover set up so that Unity will failover if a call comes into the Secondary server's ports while the Primary is online and your line group(s) or hunt configuration is misconfigured then it may just be a matter of reviewing the integration. However, you may discover a software issues (ex: TSP) or possibly hardware issue on the Primary that is now manifesting itself in this way. Last but not least, you may also find that with Windows sometimes the best fix is a good ol' reboot. Not exactly the best approach for actually identifying root cause but sometimes it's an effective measure of last resort.

Hailey

Please rate helpful posts!