02-24-2005 11:09 AM
I have seen several instances of MGCP endpoints unregistering per the server event logs. This is happening on different hardware configurations and software versions. A quick check confirms that they are re-registered. I believe this is occurring to missed keepalives. Does anyone know of a way to keep this from happening, perhaps a change to the default MGCP profile?
02-24-2005 05:05 PM
What do the CCM logs tell you?
Mike
02-24-2005 05:26 PM
I see a message showing "MGCP communication to gateway lost". Then, all the endpoints unregister and switch over to the next available CallManager. I have been looking for RSIP messages in the traces but now I believe I should be looking for the last NTFY message sent to the CCM before the endpoints unregister. I will have to wait for this to happen again. I was looking to see if I could somehow extend the time the gateway waits for a response from the CCM but so far have not found that is a configurable setting.
02-24-2005 08:41 PM
Yes, you should be looking for thr NTFY and also te response of 200 & the sequence number that matches the notify. If the GW failes to respond to 3 requests from the CCM the GW will be reset.
Mike
02-25-2005 09:17 AM
3 requests? Based on what I found at: http://www.cisco.com/en/US/products/sw/iosswrel/ps5012/products_feature_guide09186a0080087fd9.html , "The gateway maintains this connection by sending empty MGCP Notify (NTFY) keepalive messages to Cisco CallManager at 15-second intervals. If the active Cisco CallManager fails to acknowledge receipt of the keepalive message within 30 seconds, the gateway attempts to switch over to the next highest order Cisco CallManager server that is available." This suggests that one missed attempt causes the gateway to register with another CallManager if available. I think if there were three attempts, I wouldn't have this issue. As stated before though, I will wait for this to happen again, look for the NTFY and follow it through to the RSIP. Thanks.
02-25-2005 03:47 PM
It happened again, here are the pertinenet traces:
02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - Gateway TCP Link terminated. TCPHandle=0x48d1db0|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>
02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - Error Message TCPHandle=0x48d1db0, GetLastError=0x0|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>
02/25/2005 16:43:15.582 CCM|MGCPHandler received msg from: 192.168.100.251
RSIP 601895 *@VG224-GW MGCP 0.1
RM: graceful
|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>
02/25/2005 16:43:15.582 CCM|<:CCM-1-CLUSTER><:192.168.100.10><:1><:MGCPENDPOINT><:><:>
02/25/2005 16:43:15.582 CCM|MGCPInit - //// RSIP
02/25/2005 16:43:15.582 CCM|MGCPGatewayLostComm - MGCP communication to gateway lost. Device Name:vg224-gw App ID:Cisco CallManager Cluster ID:CCM-1-Cluster Node ID:192.168.100.10|<:CCM-1-CLUSTER><:192.168.100.10><:ALARMVG224-GW><:VG224-GW>
02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - TCP opened and BH registered for the device|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>
02/25/2005 16:43:15.582 CCM|MGCPBhHandler 192.168.100.251 - TCP Established with Device, TCPHandle=0x48d1db0|<:CCM-1-CLUSTER><:192.168.100.10><:1><:192.168.100.251><:>
02/25/2005 16:43:15.582 CCM|MGCPHandler send msg SUCCESSFULLY to: 192.168.100.251
200 601895
The gateway is now registered with the Subscriber but from what I have seen in the past, it will eventually have the same issue there and switch back to the Publisher.
02-25-2005 09:19 PM
I'm working with mw9714 on this also.
It happened again tonight, with another gateway of the customers, and in traces I see this:
On Subscriber:
At 20:04:07.969 CCM receives a NTFY from gateway w/sequence 37675.
At 20:04:07.969 CCM sends 200 message back
At 20:04:22.969 CCM receives a NTFY from gateway w/sequence 37676
At 20:04:22.969 CCM Sends a 200 message back to gateway
At 20:04:33.126 CCM receives a RSIP from gateway, sequence 37677. RM: Graceful
At 20:04:33.126 CCM Sends a 200 message back to gateway
At 20:04:33.141 the old TCP connection is closed amd new one opened
At 20:05:18.143 CCM received a restart message from gateway , seq 37686
Then we start getting NTFY messages again on normal basis...
On Publisher:
At 20:04:33.113 CCM receives a RSIP from gateway, sequence 37679, RM: restart
So there are 13 seconds between CCM sending last 200 message and the reset from Gateway. This isn't 30 seconds or even 15 seconds, unless the gateway did not receive any of the previouis 200 messages sent from CCM in response to NTFY messages.
Would the gateway send a new NTFY w/incremented Sequence number if it didn't get the previous response from call manager?
02-25-2005 11:07 PM
Mike Lay,
Also looking at 'show mgcp stat' on the gateways shows there are no failures, and notify rx and tx match.
The only issue on that output is with UDP rx and tx. The rx is just a tad bit less then tx packets.
One of these gateways loses its MGCP several times a day. This gateway is local to CCM and on same LAN segment and attached to same switch, in same VLAN as call managers. The other one that doesn't lose it's registration as much is across a T1. There was a small amount of errors on that so I'm watching that.
I am thinking there is something on LAN going on locally. The switch the gateways, and call managers are plugged into is a 3524 XL.
02-26-2005 04:22 PM
I have the same issue. However mine is reproduced on queue. When ever I make a long distance call the Gateway unregisters imediately. If I make a local 7 digit call no problems. Here is alink to my conversation. I have all my traces and configs.
CCM 4.1.2 1760 with FXO to PSTN
If you find anything for me to try please forward it along thanks.
04-03-2006 10:59 PM
I'm having the same problem. Did you solve this issue?
Regards,
Evi
04-04-2006 05:11 AM
I am having EXACTLY the same issue in multiple sites across WAN links and across LAN. Information in the logs regarding Notify message is the same condition as well. I have a couple of GW's that lose their connection more often than others. I have a Cisco TAC case open but Cisco has not been able to resolve or even suggest a course of action. This happens across various GW platforms and IOS versions. In some cases the GW will lose keepalive connection for an hour or two hours then reestablish and be fine for a long period of time 6-8 hours. In the extreme case the GW loses keepalive connectivity and the only way to recover it is a GW reload. ANY information on what is causing this issue and how to resolve would be GREATLY appreciated.
06-07-2006 08:06 AM
I am also recieving the same issue, the way I have to resolve it is to reset the gateway.
06-07-2006 12:31 PM
check your version of raid driver see the field notice at
symptom is you will see gateways unregister every 5 hours
06-14-2006 07:10 AM
I found the fix for atleast the issue I was having. I had to upgrade the IOS for my 2801 to
c2801-spservicesk9-mz.123-14.T7.bin. That has seemes to fix my issues for over a week now.
06-14-2006 07:11 AM
I found the fix for atleast the issue I was having. I had to upgrade the IOS for my 2801 to
c2801-spservicesk9-mz.123-14.T7.bin. That has seemes to fix my issues for over a week now.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide