cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
729
Views
0
Helpful
1
Replies

CSCvr21972 - UCCXMCVD get hazelcast memory leaks after continuous missed heartbeats.

cae_technology
Level 1
Level 1

I am getting the following email alerts from RTMT entitled '[RTMT-ALERT-StandAloneCluster] PublishingNodeChanged': Source : dataconn AppID : Cisco Tomcat ClusterID : NodeID : is-uccx01-b18.lancs.ac.uk TimeStamp : Fri Jun 04 18:04:55 BST 2021. The alarm is generated on Fri Jun 04 18:04:55 BST 2021. Source : dataconn AppID : Cisco Tomcat ClusterID : NodeID : fa-uccx01-lg02.lancs.ac.uk TimeStamp : Sat Jun 05 21:32:08 BST 2021. The alarm is generated on Sat Jun 05 21:32:08 BST 2021. Looking at MCVD logs of the latest incident (saturday 5th @ 2132), I can see there is an issue with HazelCast and it appears to lose connectivity: 3056205: Jun 05 21:32:01.946 BST %MCVD-DB_MGR-7-UNK: [Thread-32] com.cisco.database.impl.EntityDataSource EntityDataSource.checkConnectivity for fa-uccx01-lg02 is true 3056206: Jun 05 21:32:02.948 BST %MCVD-DB_MGR-7-UNK: [Thread-32] com.cisco.database.impl.EntityDataSource EntityDataSource.checkConnectivity for fa-uccx01-lg02 is true 3056207: Jun 05 21:32:03.950 BST %MCVD-DB_MGR-7-UNK: [Thread-32] com.cisco.database.impl.EntityDataSource EntityDataSource.checkConnectivity for fa-uccx01-lg02 is true 3056208: Jun 05 21:32:04.393 BST %MCVD-CVD-7-UNK: [hz._hzInstance_1_UccxCvdCluster-1351000220000.event-10] com.hazelcast.cluster.impl.ClusterServiceImpl Hazelcast.memberRemoved: member=Member [10.42.18.75]:5900 3056209: Jun 05 21:32:04.393 BST %MCVD-CVD-4-HEARTBEAT_SUSPECT_NODE_CRASH: [hz._hzInstance_1_UccxCvdCluster-1351000220000.event-10] com.cisco.cluster.impl.cvd.ClusterViewManager CVD suspects node crash: state=HEARTBEAT_HAZELCAST,nodeInfo=Node[nodeId=1, ip=10.42.18.75],dt=null It then rejoins a short while later: 3056422: Jun 05 21:32:19.708 BST %MCVD-CVD-4-MASTER_DETECTS_NODE_JOIN: [MCVD_CVD_DISPATCHER-5-0-com.cisco.cluster.impl.cvd.Dispatcher1] com.cisco.cluster.impl.cvd.CVDMasterImpl More than one master detected, when processing node join: name=Cisco Unified CCX Database,nodeId=1,masterCnt=1 3056423: Jun 05 21:32:19.708 BST %MCVD-CVD-7-UNK: [MCVD_CVD_DISPATCHER-5-0-com.cisco.cluster.impl.cvd.Dispatcher1] com.cisco.cluster.impl.cvd.CVDMasterImpl Split after network partition is detected, new nodeId=1

1 Reply 1

Jonny5
Level 1
Level 1

I'm seeing a similar issue. Would you mind sharing the resolution you found? Thinking mine is an OS compatibility issue on the VM.