03-15-2013 02:34 AM - edited 03-19-2019 06:26 AM
Hi Folks!
We have three Unified Presence Nodes running with Systemversion 8.6.4.12900-2. Since I enabled RTMT Email Alerts, I get the follogin Alert from time to time from different nodes (not always the same). Strange think is, that if I check service via serviceability interface, it is up and running without a new downtime:
[RTMT-ALERT-StandAloneCluster12487] PEPeerNodeFailure
PEPeerNodeFailureAlarmMessage : Node pe54005002: OUT-OF-SERVICE
AppID : Cisco UP Presence Engine
ClusterID : StandAloneCluster12487
NodeID : cups-01
TimeStamp : Fri Mar 15 09:59:00 CET 2013.
The alarm is generated on Fri Mar 15 09:59:00 CET 2013.
Hope someone can help me in this.
Thanks.
Regards
01-24-2014 12:51 PM
I'am also seeing a similar problem on System version: 8.6.5.10000-12. Followed by a "Split-Brain" effect where users from one node cannot see the status or IM contacts registered to the other node. Any thoughts?
PEPeerNodeFailureAlarmMessage : Node pe456272906: OUT-OF-SERVICE
AppID : Cisco UP Presence Engine ClusterID : StandAloneCluster8a2db
NodeID : sdep-cup02
TimeStamp : Thu Jan 23 22:00:54 CST 2014.
The alarm is generated on Thu Jan 23 22:00:54 CST 2014.
01-24-2014 06:46 PM
Please check the following bug
https://tools.cisco.com/bugsearch/bug/CSCuf74738/?reffering_site=dumpcr
PEPeerNodeFailureAlarmMessage alerts seen regularly in RTMT
Symptom:
PEPeerNodeFailureAlarmMessage alerts seen regularly in RTMT
Conditions:
No particular conditions are met, it tends to happen over night.
Workaround:
None
HTH
Manish
01-25-2014 04:35 AM
Hi,
since I set the Trace Levels on Presence Server all back to default settings (some of them had a debug level setup), the error didn't come up again. We had the same error of inconsistance availability status.
01-28-2014 06:08 PM
Thanks Rene,
We also experienced a network fail-over that seemed to kick off all our problems. After the CUP servers replicated and started migrating users, the nodes would reach high Virtual Memory and CPU and the processes would either crash or the servers would hang.
during after hours we were able to at least stabalize the servers so they weren't crashing (still had high VirtualMemory and SWAP) with some speradic CPU util, and that's when we noticed the "split Brain" effect.
Based on your suggestion Rene, we looked at the trace settings and noticed all were set to debug. We turned them all off and restarted the CUP XCP Config manager, and then CUP XCP Router on the Sub node. the Sub node came back up and we were able to see status and IM to user one the PUB. After a few days of a working scenario we restart the CUP XCP Config and CUP XCP Router on the PUB, which restored virtualmemory and SWAP back to a normal operating conditions.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide