04-07-2017 04:07 AM - edited 03-14-2019 05:12 PM
Gents,
I had a brief outage two days ago between a duplexed pair of PGs. I am trying to understand what caused the outage.
While I looked at the mds logs, I observed that there was an error about the private link. The private link in question never went down. I checked the link, interface etc and I never got any alarm about the private link. The only alarm I got from our monitoring system was about the PG2B
Based on these logs, can we determine if this was a network related issue or an ICM component issue. What can I do further to get to the bottom of this.
Here is the alarm I got
"Cisco Systems, Inc. ICM (32791 - Message Delivery): Client pim1 stopping due to error.
Cisco Systems, Inc. ICM (63 - Peripheral Controller): SideB pim2 process down."
Many thanks
PG2A
01:40:50:048 pg2a-mds Connectivity with duplexed partner has been lost due to a failure of the private network, or duplexed partner is out of service.
Last EMT Error [-519897073]: Connection broken due to loss of heartbeats.
01:40:50:048 pg2a-mds Synchronizer switching to non-duplex operation.
01:40:50:080 pg2a-mds Trace: Received TOS request: sequence = 1 pg_weight = 0.
01:40:50:080 pg2a-mds Trace: Processing TOS request: sideA_weight=130, sideB_weight=0, TOS pg_weight=0
01:40:50:080 pg2a-mds Trace: Sending TOS response: sequence=1 status=ENABLED.
01:40:55:602 pg2a-mds Communication with peer Synchronizer established.
01:40:55:805 pg2a-mds Synchronizer switching to active duplex operation.
02:09:38:256 pg2a-mds MDS Process is reporting periodic overall metering statistics.
PG2B
01:40:49:962 pg2b-mds Connectivity with duplexed partner has been lost due to a failure of the private network, or duplexed partner is out of service.
Last EMT Error [-519897073]: Connection broken due to loss of heartbeats.
01:40:49:962 pg2b-mds Initiating test of peer Synchronizer.
01:40:49:962 pg2b-mds Trace: Sending TOS request: sequence = 1, pg_weight=0.
01:40:49:977 pg2b-mds Trace: Received TOS response: sequence=1 status=ENABLED.
01:40:49:977 pg2b-mds Peer Synchronizer was found to be active.
01:40:49:977 pg2b-mds Synchronizer suspending operation.
01:40:49:977 pg2b-mds MDS going out of service.
01:40:49:977 pg2b-mds Client ctisvr stopping due to error.
01:40:49:977 pg2b-mds Client opc stopping due to error.
01:40:49:977 pg2b-mds Client pim1 stopping due to error.
01:40:49:977 pg2b-mds Client pim3 stopping due to error.
01:40:49:977 pg2b-mds Client pim2 stopping due to error.
01:40:51:101 pg2b-mds Client ctisvr registered with handle 17.
01:40:51:101 pg2b-mds Client ctisvr started.
01:40:51:147 pg2b-mds Client opc registered with handle 1.
01:40:51:147 pg2b-mds Client opc started.
01:40:55:687 pg2b-mds Communication with peer Synchronizer established.
01:40:55:703 pg2b-mds Synchronizer switching to passive duplex operation.
01:40:55:968 pg2b-mds MDS now in service.
01:41:00:149 pg2b-mds Client pim1 registered with handle 33.
01:41:00:149 pg2b-mds Client pim1 started.
01:41:00:259 pg2b-mds Client pim3 registered with handle 35.
01:41:00:259 pg2b-mds Client pim3 started.
01:41:00:259 pg2b-mds Client pim2 registered with handle 34.
01:41:00:259 pg2b-mds Client pim2 started.
01:47:58:449 pg2b-mds MDS Process is reporting periodic overall metering statistics.
04-07-2017 01:39 PM
You might want to check eventviewer/application logs PG2A to see if there was a spike in the CPU , memory etc.I have seen OS level issues causing MDS sync to be broken apart from regular network issues.if memory serves we had to upgrade the PG.
May be run a ping plotter between two sides to detect network failure as sometimes network team might not be able to detect a small blip which we can see in ping plotter .
05-16-2017 01:59 AM
Hello,
Iam also facing the similar issue. But in my case PG2B services are going down every 10-20 minutes.ctisrvr, opc and eagtpim is going down. I can see an error " agent pg connectivity with duplexed partner lost due to a failure of the private network or duplexed partner is out of service". But i have monitored the network and there is no issue. appreciate for any help. Iam still reviewing the server for more information
03-07-2019 12:12 AM
Did you find a solution to this issue ?
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide