Solved: CUPS 8.6.3.10000-20 - Presence Issue

Jordan Rudess · ‎02-18-2013

Hello all, this weekend I gracefully shut down (putty > utils-system-shutdown) our CUPS publisher to grab a backup of it (it's a VM) - when I brought it back up I had to use the failback option to get it back into the cluster (we only have one susbcriber at this time). That appeared to work and it evenly distributed the users across both servers, there are no errors shown and everything appears to be ok looking in the CUPS admin. But I soon discovered that users on the susbcriber weren't seeing presence of users on the publisher and vice versa. So I manually moved everyone to the subscriber and everyone was able to see each other's presence again (in jabber for example). Next, I tried to rebalance the users again and the same issue happened - users on one server can't seem to see presence of users on the other. As a test to make sure it wasn't just the publisher, I manually moved everyone to the publisher (the one I initially shut down to backup) and everyone can see each's other's presence there too, just not when split between the two servers. Any ideas what to check/try next?

Thanks,

Jordan

Aaron Harrison · ‎02-18-2013

Hi Jordan

By default the inter-server comms uses mcast DNS; if your servers are on different subnets it doesn't work and there may be other scenarios.

I usually set the cluster up for 'router-to-router' communication instead.

See here:

http://www.cisco.com/en/US/docs/voice_ip_comm/cups/8_6/english/install_upgrade/deployment/guide/dgcupc.html#wp1137417

Regards

Aaron

Aaron Please remember to rate helpful posts to identify useful responses, and mark 'Answered' if appropriate!

View solution in original post

Aaron Harrison · ‎02-18-2013

Hi Jordan

By default the inter-server comms uses mcast DNS; if your servers are on different subnets it doesn't work and there may be other scenarios.

I usually set the cluster up for 'router-to-router' communication instead.

See here:

http://www.cisco.com/en/US/docs/voice_ip_comm/cups/8_6/english/install_upgrade/deployment/guide/dgcupc.html#wp1137417

Regards

Aaron

Aaron Please remember to rate helpful posts to identify useful responses, and mark 'Answered' if appropriate!

Jordan Rudess · ‎02-18-2013

Hey Aaron, thanks for your reply. Both servers are on the same subnet. And this was working prior to me simply shutting down the server this weekend to grab a backup. Do you think maybe I just need to restart the services used for inter-server communication? If so do you you know which services they are? The link mentions the Cisco UP XCP Router service but I wasn't sure if there were any others related. Or is it still worth changing to router-to-router like you mentioned as a test?

Thanks for your time,

Jordan

Aaron Harrison · ‎02-18-2013

Hi

You'll still need to reboot the servers if you do change to router-to-router (or restart XCP, but that's the same thing really, users will lose service for a while).

Are there no errors in the topology, troubleshooter screens or the notifications list?

Aaron

Aaron Please remember to rate helpful posts to identify useful responses, and mark 'Answered' if appropriate!

Jordan Rudess · ‎02-18-2013

Under System > Cluster Topology everything shows up fine. Green checks on both servers and all services. High availability shows state and reason normal for both servers.

Notifications I have two:

Cisco UP Server Recovery Manager (cups1) : A manual fallback has been initiated

Cisco UP Server Recovery Manager (cups2) : An automatic failover has been initiated due to the peer node being down

Makes sense since I suppose since had to bring it back into the cluster.

The troubleshooter shows a couple warnings that have been there even before this:

Under Presence Engine Troubleshooter:

Verify valid Presence Gateways (check reachability) Invalid Presence Gateway. Unreachable gateways include:cucm.xyz.com

Verify SIP Publish Model Failed to verify host connectivity. Please check if using DNS SRV for host cups.xyz.com

We don't use callmanager as presence gateway but I think you just have to have something entered there which is why I get these warnings. We had a 3rd party company set it up this way and it has worked fine, at least until now.

Under Microsoft RCC Troubleshooter:

Verify Microsoft RCC application is active Microsoft RCC application is currently toggled to inactive.

I don't think we use any of this which is why there is a warning.

Under Meeting Notification:

Verify that MeetingPlace is properly configured (to support Join Meeting callback feature) The MeetingPlace configuration is invalid (check MeetingPlace Address). MeetingPlace must be properly configured in order for the Join Meeting callback feature to work properly .

Again something else I don't think we use.

Under Topology (in the troubleshooter) of course I see this .

Verify all assigned nodes have users assigned to them Following nodes have no users assigned

cups2.

That's because I've moved everyone to the pub for now!

Thanks,

Jordan

Aaron Harrison · ‎02-18-2013

All seems reasonable.

re: the presence gateway CUCM - that provides 'on the phone' status on a good day. The web gui can't validate SRV records so it can be ignored.

I'd try the router-to-router setting.

Aaron

Aaron Please remember to rate helpful posts to identify useful responses, and mark 'Answered' if appropriate!

Jordan Rudess · ‎02-18-2013

Ok that's worth a shot. I'll have to try this during non-business hours this evening since we rely heavily on presence here. I'll report back what I find. I really appreciate your input.

Thanks,

Jordan

VLT06 · ‎02-18-2013

I will be interested to see how you go!

We have a similar setup and every time a user crosses nodes by an administrative push (including failover & rebalance), the users on the SUB seem to loose presence as well. In saying that, it does seem to come good after quite a few hours (approx 3000 users). Although the system reports all show ok, I think it might be coming down to data replication (although it too seems ok via report).

One thing I have noticed is that this is less of an issue for CUPC clients opposed to Jabber???.....

Jordan Rudess · ‎02-18-2013

Hey Aaron, I did your suggestion and that worked. I simply changed it to router-to-router and restarted the XCP Router service on both servers. It took about 5 mins for each server. I then rebalanced the users which took another few minutes but they came up and both nodes could see each other's presence again. So thank you very much for your input!

Thanks,

Jordan