Devices Unregister on CCM!!!!

Ayodeji Okanlawon · ‎01-17-2008

Hi,

Recetly there have been a few complaints from users that sometimes when they make calls, the calls just drops out.

I then opened CCM traces and I found out that devices are unregistering with the CCM...My event viewver if filled up with these messages

I then looked at the event viewer and I was alarmed to find out that lots of devices are unregistering with CCM even my MGCP endpoint.

These are a few messages from the event viewer...

Error: DeviceUnregistered - Device unregistered.

Device name.: SEP0013C35A960B

Device IP address.: 192.168.105.36

Device type. [Optional]: 8

Device description [Optional].: SEP0013C35A960B

Reason Code [Optional].: 8

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Error: DeviceUnregistered - Device unregistered.

Device name.: SEP000750833786

Device IP address.: 192.168.105.159

Device type. [Optional]: 8

Device description [Optional].: SEP000750833786

Reason Code [Optional].: 8

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Error: DChannelOOS - D channel out of service.

Device Name.: S0/SU2/DS1-0@NSCFelthamVG

Device IP address: 192.168.105.240

Channel Id.: 16

Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:16

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated D channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: BChannelOOS - B channel out of service.

Device Name.: S0/SU2/DS1-0@NSCFelthamVG

Channel Id.: 8

Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:8

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated B channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: BChannelOOS - B channel out of service.

Device Name.: S0/SU2/DS1-0@NSCFelthamVG

Channel Id.: 8

Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:8

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated B channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: BChannelOOS - B channel out of service.

Device Name.: S0/SU2/DS1-0@NSCFelthamVG

Channel Id.: 6

Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:6

Reason [Optional].: 0

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Explanation: Indicated B channel has gone out of service.

Recommended Action: Contact TAC for help if this alarm continue to generate..

Error: DeviceUnregistered - Device unregistered.

Device name.: MTP-IPCS3825

Device IP address.: 192.168.105.240

Device type. [Optional]: 112

Device description [Optional].: MTP-IPCS3825

Reason Code [Optional].: 9

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Error: DeviceUnregistered - Device unregistered.

Device name.: S0/SU2/DS1-0@NSCFelthamVG

Device IP address.: 192.168.105.240

Device type. [Optional]: 121

Device description [Optional].: S0/SU2/DS1-0@192.168.105.240

Reason Code [Optional].: 8

App ID: Cisco CallManager

Cluster ID: StandAloneCluster

Node ID: 192.168.2.12

Please what do I need to do wo resolve this.

what does optional code 8 refers to with regards to the IP phones reason code.

Please rate all useful posts

Rob Huffman · ‎01-17-2008

Hi Deji,

Here is a link to the Reason Codes which should help to narrow down the issue;

http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/err_msgs/4_x/alarms41.htm

The phones look like they were reset at the phone itself. The Gateway looks like a CCM reset.

Hope this helps!

Rob

Ayodeji Okanlawon · ‎01-17-2008

Rob,

Happy new year to you!

Thanks for this invaluable information.

I will dig deeper to see why this is happening. It is really affecting users.

Should have any further ideas please do nto hesitate to share with me

Please rate all useful posts

William Bell · ‎01-17-2008

Rob provided the info you need to understand reason codes. What you need to determine is:

1. Initial event. Look at the earliest event in your event logs/traces and see if you can determine when this started to be an issue. Note that you may find the buffer is flushed due to volume of messages.

2. Isolate. Depending on the size of your network you may will want to identify the affected node groups/subnets to see if you can isolate the issue to a specific subnet or intermediate network connection.

3. System Log. Look in the system event log for any abnormal failures of subsystems

4. History log. Check Cisco install history log (\program files\common files\cisco\logs\history.log) to check recent upgrades. See if event horizon correlates to upgrade action.

5. Who is not affected. Ties into isolation steps. Identify if you have nodes that are not affected by the issue.

One interesting piece of information you provided is that the "call just drops out". Does the user see a Temp Fail message on the LCD at that time? Do you see Skinny alarm messages (Station Event alerts in App event log) in your trace or event log? Based on the limited info provided, there is a possibility your issue is with the gateway or on the network between the gateway and other nodes. The reason I suspect this is because if the CUCM host was the problem your call should stay up with the MGCP gateway during the event. But if the gateway was the issue, your call would drop and your phone would re-register.

Regards,

Bill

HTH -Bill (b) http://ucguerrilla.com (t) @ucguerrilla

Please remember to rate helpful responses and identify

Ayodeji Okanlawon · ‎01-17-2008

Billy,

Thanks for your response.

However, from my troubleshooting, I came to the conclusion that the problem lies with callmanager or the connection between CCM and IP Phones ( this is over a LAN extension)

These are my findings:

1. Using the q.931 isdn translator, I observed that the gateway terminated the call with a cause code of 90 (normal call clearing)

2. I then went into the CCM trace details, and I found out that during the call, the IP phone unregisterd with CCM. The call manager then initiated a Closed channel request.

3. After this the CCM then told the MGCP gateway to tear down connection for the call.

This is the reason why the cause code from the gateway was normal call clearing..

The event logs is filled up with IP phones unregistering with a reason code of 8 and 9...Which implies 8: DeviceInitiated Reset and 9:Callmanager reset

Hence it is obvious that something is happening betwwen Callmanager and IP Phones.

One possible suspect is the loss of keepalives....I say this necause at some point today the message CCM down showed on one of the IP Phones but lasted only a few seconds...

Is there snything I can use to troubleshoot keep alives between Callmanager and IP phones.

This issue is not happening in the sites where the IP Phones are local to the callmanager....

Please rate all useful posts

jbarcena · ‎01-17-2008

Well the best way to troubleshoot keepalives is with a sniffer trace, you could use a program like the ethereal on a PC that is connected behind an IP Phone that is having the problem, but you will have a lot of information if it is not that common.

You could also try increasing the keepalive time on the CCM service parameters.

HTH

//Jorge

Ayodeji Okanlawon · ‎01-17-2008

I have narrowed the problem down to this:

CCM-Aborted-TCP Connection...

Callmanager is aborting TCP connection with the IP Phones hence the IP Phones re-initialize ..

What can I check...

Please rate all useful posts

jbarcena · ‎01-17-2008

That usually happens when CCM does not receive three keepalives from the IP Phones, I recommend you to check the connectivity between the phones and the CCM server.

Also upload the CCM trace from the time of the problem to see what else do you have and the MAC address of the phone that got unregistered.

Ayodeji Okanlawon · ‎01-17-2008

attached is the trace...and here is the mac address of the phone....SEP000F9069EA86

Thanks

Please rate all useful posts

Rob Huffman · ‎01-17-2008

Hi Deji,

This is always hard to pinpoint my friend, but from my viewpoint this looks like a Network problem like you said. The loss of keepalives almost surely points to this.

Maybe you can see what was happening on the Network at this time.

Rob

Zin.Karzazi · ‎01-18-2008

like already suggested, your best bet is to use a sniffer (ethereal).

jbarcena · ‎01-18-2008

I will need the trace in .txt instead of .xml also have you tried increasing the keepalive time for test purposes? what was the result?

Ayodeji Okanlawon · ‎01-18-2008

Thanks. I have done that and I did not see any noticeable difference.

I then checked on the interface connecting the IP phones to the CCM and I found out that there were lots of interface resets and collisions.

The interfcae is 100MB LAN extension link to the main office. I observed that the duplex was set to half on both ends of the router

I changed this on both ends to full, and I have noticed a remarkable difference.

I am still lookign at it.

Can I ask a question?

How do I know that keep alives are missed..

I used wireshark to trace keep alives and skinny..but I couldnt see anything indicating that keep alives were missing..

What do I need to look for in thetrace..

Unfortunately My traces are only setup to log in XML formats...So I cant provide the text formats

Thanks

Please rate all useful posts

jbarcena · ‎01-18-2008

Well, with a sniffer trace on the back of the phone you can see when you send literally the keepalive message from the IP address of the phone to the IP address of the CCM server

Ayodeji Okanlawon · ‎01-18-2008

Thank you.

I see the keep alives, but I do not know when they are lost...

Thats why I am asking, if I can use the time or sequence number or anything to know when the keep alives are lost

Please rate all useful posts