MGCP endpoints momentarily unregister

mw9714 · ‎02-24-2005

I have seen several instances of MGCP endpoints unregistering per the server event logs. This is happening on different hardware configurations and software versions. A quick check confirms that they are re-registered. I believe this is occurring to missed keepalives. Does anyone know of a way to keep this from happening, perhaps a change to the default MGCP profile?

milay · ‎02-24-2005

What do the CCM logs tell you?

Mike

mw9714 · ‎02-24-2005

I see a message showing "MGCP communication to gateway lost". Then, all the endpoints unregister and switch over to the next available CallManager. I have been looking for RSIP messages in the traces but now I believe I should be looking for the last NTFY message sent to the CCM before the endpoints unregister. I will have to wait for this to happen again. I was looking to see if I could somehow extend the time the gateway waits for a response from the CCM but so far have not found that is a configurable setting.

milay · ‎02-24-2005

Yes, you should be looking for thr NTFY and also te response of 200 & the sequence number that matches the notify. If the GW failes to respond to 3 requests from the CCM the GW will be reset.

Mike

mw9714 · ‎02-25-2005

3 requests? Based on what I found at: http://www.cisco.com/en/US/products/sw/iosswrel/ps5012/products_feature_guide09186a0080087fd9.html , "The gateway maintains this connection by sending empty MGCP Notify (NTFY) keepalive messages to Cisco CallManager at 15-second intervals. If the active Cisco CallManager fails to acknowledge receipt of the keepalive message within 30 seconds, the gateway attempts to switch over to the next highest order Cisco CallManager server that is available." This suggests that one missed attempt causes the gateway to register with another CallManager if available. I think if there were three attempts, I wouldn't have this issue. As stated before though, I will wait for this to happen again, look for the NTFY and follow it through to the RSIP. Thanks.

mw9714 · ‎02-25-2005

It happened again, here are the pertinenet traces:

02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - Gateway TCP Link terminated. TCPHandle=0x48d1db0|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>

02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - Error Message TCPHandle=0x48d1db0, GetLastError=0x0|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>

02/25/2005 16:43:15.582 CCM|MGCPHandler received msg from: 192.168.100.251

RSIP 601895 *@VG224-GW MGCP 0.1

RM: graceful

|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>

02/25/2005 16:43:15.582 CCM|<:CCM-1-CLUSTER><:192.168.100.10><:1><:MGCPENDPOINT><:><:>

02/25/2005 16:43:15.582 CCM|MGCPInit - //// RSIP from *@VG224-GW|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>

02/25/2005 16:43:15.582 CCM|MGCPGatewayLostComm - MGCP communication to gateway lost. Device Name:vg224-gw App ID:Cisco CallManager Cluster ID:CCM-1-Cluster Node ID:192.168.100.10|<:CCM-1-CLUSTER><:192.168.100.10><:ALARMVG224-GW><:VG224-GW>

02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - TCP opened and BH registered for the device|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>

02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - TCP Established with Device, TCPHandle=0x48d1db0|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>

02/25/2005 16:43:15.582 CCM|MGCPHandler send msg SUCCESSFULLY to: 192.168.100.251

200 601895

The gateway is now registered with the Subscriber but from what I have seen in the past, it will eventually have the same issue there and switch back to the Publisher.

Erick Bergquist · ‎02-25-2005

I'm working with mw9714 on this also.

It happened again tonight, with another gateway of the customers, and in traces I see this:

On Subscriber:

At 20:04:07.969 CCM receives a NTFY from gateway w/sequence 37675.

At 20:04:07.969 CCM sends 200 message back

At 20:04:22.969 CCM receives a NTFY from gateway w/sequence 37676

At 20:04:22.969 CCM Sends a 200 message back to gateway

At 20:04:33.126 CCM receives a RSIP from gateway, sequence 37677. RM: Graceful

At 20:04:33.126 CCM Sends a 200 message back to gateway

At 20:04:33.141 the old TCP connection is closed amd new one opened

At 20:05:18.143 CCM received a restart message from gateway , seq 37686

Then we start getting NTFY messages again on normal basis...

On Publisher:

At 20:04:33.113 CCM receives a RSIP from gateway, sequence 37679, RM: restart

So there are 13 seconds between CCM sending last 200 message and the reset from Gateway. This isn't 30 seconds or even 15 seconds, unless the gateway did not receive any of the previouis 200 messages sent from CCM in response to NTFY messages.

Would the gateway send a new NTFY w/incremented Sequence number if it didn't get the previous response from call manager?

Erick Bergquist · ‎02-25-2005

Mike Lay,

Also looking at 'show mgcp stat' on the gateways shows there are no failures, and notify rx and tx match.

The only issue on that output is with UDP rx and tx. The rx is just a tad bit less then tx packets.

One of these gateways loses its MGCP several times a day. This gateway is local to CCM and on same LAN segment and attached to same switch, in same VLAN as call managers. The other one that doesn't lose it's registration as much is across a T1. There was a small amount of errors on that so I'm watching that.

I am thinking there is something on LAN going on locally. The switch the gateways, and call managers are plugged into is a 3524 XL.

TODD BERGMAN · ‎02-26-2005

I have the same issue. However mine is reproduced on queue. When ever I make a long distance call the Gateway unregisters imediately. If I make a local 7 digit call no problems. Here is alink to my conversation. I have all my traces and configs.

http://forum.cisco.com/eforum/servlet/NetProf?page=netprof&forum=IP%20Communications%20and%20Video&topic=IP%20Telephony&CommCmd=MB%3Fcmd%3Ddisplay_location%26location%3D.1dd781bd

CCM 4.1.2 1760 with FXO to PSTN

If you find anything for me to try please forward it along thanks.

epekou · ‎04-03-2006

I'm having the same problem. Did you solve this issue?

Regards,

Evi

don.mcneil · ‎04-04-2006

I am having EXACTLY the same issue in multiple sites across WAN links and across LAN. Information in the logs regarding Notify message is the same condition as well. I have a couple of GW's that lose their connection more often than others. I have a Cisco TAC case open but Cisco has not been able to resolve or even suggest a course of action. This happens across various GW platforms and IOS versions. In some cases the GW will lose keepalive connection for an hour or two hours then reestablish and be fine for a long period of time 6-8 hours. In the extreme case the GW loses keepalive connectivity and the only way to recover it is a GW reload. ANY information on what is causing this issue and how to resolve would be GREATLY appreciated.

fowlerds · ‎06-07-2006

I am also recieving the same issue, the way I have to resolve it is to reset the gateway.

litrenta · ‎06-07-2006

check your version of raid driver see the field notice at

http://www.cisco.com/en/US/customer/products/hw/voiceapp/ps378/products_field_notice09186a008055528f.shtml

symptom is you will see gateways unregister every 5 hours

fowlerds · ‎06-14-2006

I found the fix for atleast the issue I was having. I had to upgrade the IOS for my 2801 to

c2801-spservicesk9-mz.123-14.T7.bin. That has seemes to fix my issues for over a week now.

fowlerds · ‎06-14-2006

I found the fix for atleast the issue I was having. I had to upgrade the IOS for my 2801 to

c2801-spservicesk9-mz.123-14.T7.bin. That has seemes to fix my issues for over a week now.