01-17-2008 10:05 AM - edited 03-15-2019 08:17 AM
Hi,
Recetly there have been a few complaints from users that sometimes when they make calls, the calls just drops out.
I then opened CCM traces and I found out that devices are unregistering with the CCM...My event viewver if filled up with these messages
I then looked at the event viewer and I was alarmed to find out that lots of devices are unregistering with CCM even my MGCP endpoint.
These are a few messages from the event viewer...
Error: DeviceUnregistered - Device unregistered.
Device name.: SEP0013C35A960B
Device IP address.: 192.168.105.36
Device type. [Optional]: 8
Device description [Optional].: SEP0013C35A960B
Reason Code [Optional].: 8
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Error: DeviceUnregistered - Device unregistered.
Device name.: SEP000750833786
Device IP address.: 192.168.105.159
Device type. [Optional]: 8
Device description [Optional].: SEP000750833786
Reason Code [Optional].: 8
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Error: DChannelOOS - D channel out of service.
Device Name.: S0/SU2/DS1-0@NSCFelthamVG
Device IP address: 192.168.105.240
Channel Id.: 16
Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:16
Reason [Optional].: 0
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Explanation: Indicated D channel has gone out of service.
Recommended Action: Contact TAC for help if this alarm continue to generate..
Error: BChannelOOS - B channel out of service.
Device Name.: S0/SU2/DS1-0@NSCFelthamVG
Channel Id.: 8
Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:8
Reason [Optional].: 0
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Explanation: Indicated B channel has gone out of service.
Recommended Action: Contact TAC for help if this alarm continue to generate..
Error: BChannelOOS - B channel out of service.
Device Name.: S0/SU2/DS1-0@NSCFelthamVG
Channel Id.: 8
Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:8
Reason [Optional].: 0
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Explanation: Indicated B channel has gone out of service.
Recommended Action: Contact TAC for help if this alarm continue to generate..
Error: BChannelOOS - B channel out of service.
Device Name.: S0/SU2/DS1-0@NSCFelthamVG
Channel Id.: 6
Unique channel Id: S0/SU2/DS1-0@NSCFelthamVG:6
Reason [Optional].: 0
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Explanation: Indicated B channel has gone out of service.
Recommended Action: Contact TAC for help if this alarm continue to generate..
Error: DeviceUnregistered - Device unregistered.
Device name.: MTP-IPCS3825
Device IP address.: 192.168.105.240
Device type. [Optional]: 112
Device description [Optional].: MTP-IPCS3825
Reason Code [Optional].: 9
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Error: DeviceUnregistered - Device unregistered.
Device name.: S0/SU2/DS1-0@NSCFelthamVG
Device IP address.: 192.168.105.240
Device type. [Optional]: 121
Device description [Optional].: S0/SU2/DS1-0@192.168.105.240
Reason Code [Optional].: 8
App ID: Cisco CallManager
Cluster ID: StandAloneCluster
Node ID: 192.168.2.12
Please what do I need to do wo resolve this.
what does optional code 8 refers to with regards to the IP phones reason code.
01-17-2008 11:13 AM
Hi Deji,
Here is a link to the Reason Codes which should help to narrow down the issue;
http://www.cisco.com/en/US/docs/voice_ip_comm/cucm/err_msgs/4_x/alarms41.htm
The phones look like they were reset at the phone itself. The Gateway looks like a CCM reset.
Hope this helps!
Rob
01-17-2008 12:12 PM
Rob,
Happy new year to you!
Thanks for this invaluable information.
I will dig deeper to see why this is happening. It is really affecting users.
Should have any further ideas please do nto hesitate to share with me
01-17-2008 11:59 AM
Rob provided the info you need to understand reason codes. What you need to determine is:
1. Initial event. Look at the earliest event in your event logs/traces and see if you can determine when this started to be an issue. Note that you may find the buffer is flushed due to volume of messages.
2. Isolate. Depending on the size of your network you may will want to identify the affected node groups/subnets to see if you can isolate the issue to a specific subnet or intermediate network connection.
3. System Log. Look in the system event log for any abnormal failures of subsystems
4. History log. Check Cisco install history log (\program files\common files\cisco\logs\history.log) to check recent upgrades. See if event horizon correlates to upgrade action.
5. Who is not affected. Ties into isolation steps. Identify if you have nodes that are not affected by the issue.
One interesting piece of information you provided is that the "call just drops out". Does the user see a Temp Fail message on the LCD at that time? Do you see Skinny alarm messages (Station Event alerts in App event log) in your trace or event log? Based on the limited info provided, there is a possibility your issue is with the gateway or on the network between the gateway and other nodes. The reason I suspect this is because if the CUCM host was the problem your call should stay up with the MGCP gateway during the event. But if the gateway was the issue, your call would drop and your phone would re-register.
Regards,
Bill
Please remember to rate helpful responses and identify
01-17-2008 12:56 PM
Billy,
Thanks for your response.
However, from my troubleshooting, I came to the conclusion that the problem lies with callmanager or the connection between CCM and IP Phones ( this is over a LAN extension)
These are my findings:
1. Using the q.931 isdn translator, I observed that the gateway terminated the call with a cause code of 90 (normal call clearing)
2. I then went into the CCM trace details, and I found out that during the call, the IP phone unregisterd with CCM. The call manager then initiated a Closed channel request.
3. After this the CCM then told the MGCP gateway to tear down connection for the call.
This is the reason why the cause code from the gateway was normal call clearing..
The event logs is filled up with IP phones unregistering with a reason code of 8 and 9...Which implies 8: DeviceInitiated Reset and 9:Callmanager reset
Hence it is obvious that something is happening betwwen Callmanager and IP Phones.
One possible suspect is the loss of keepalives....I say this necause at some point today the message CCM down showed on one of the IP Phones but lasted only a few seconds...
Is there snything I can use to troubleshoot keep alives between Callmanager and IP phones.
This issue is not happening in the sites where the IP Phones are local to the callmanager....
01-17-2008 03:26 PM
Well the best way to troubleshoot keepalives is with a sniffer trace, you could use a program like the ethereal on a PC that is connected behind an IP Phone that is having the problem, but you will have a lot of information if it is not that common.
You could also try increasing the keepalive time on the CCM service parameters.
HTH
//Jorge
01-17-2008 03:54 PM
I have narrowed the problem down to this:
CCM-Aborted-TCP Connection...
Callmanager is aborting TCP connection with the IP Phones hence the IP Phones re-initialize ..
What can I check...
01-17-2008 04:00 PM
That usually happens when CCM does not receive three keepalives from the IP Phones, I recommend you to check the connectivity between the phones and the CCM server.
Also upload the CCM trace from the time of the problem to see what else do you have and the MAC address of the phone that got unregistered.
01-17-2008 04:09 PM
01-17-2008 10:12 PM
Hi Deji,
This is always hard to pinpoint my friend, but from my viewpoint this looks like a Network problem like you said. The loss of keepalives almost surely points to this.
Maybe you can see what was happening on the Network at this time.
Rob
01-18-2008 12:35 AM
like already suggested, your best bet is to use a sniffer (ethereal).
01-18-2008 08:12 AM
I will need the trace in .txt instead of .xml also have you tried increasing the keepalive time for test purposes? what was the result?
01-18-2008 08:26 AM
Thanks. I have done that and I did not see any noticeable difference.
I then checked on the interface connecting the IP phones to the CCM and I found out that there were lots of interface resets and collisions.
The interfcae is 100MB LAN extension link to the main office. I observed that the duplex was set to half on both ends of the router
I changed this on both ends to full, and I have noticed a remarkable difference.
I am still lookign at it.
Can I ask a question?
How do I know that keep alives are missed..
I used wireshark to trace keep alives and skinny..but I couldnt see anything indicating that keep alives were missing..
What do I need to look for in thetrace..
Unfortunately My traces are only setup to log in XML formats...So I cant provide the text formats
Thanks
01-18-2008 09:36 AM
Well, with a sniffer trace on the back of the phone you can see when you send literally the keepalive message from the IP address of the phone to the IP address of the CCM server
01-18-2008 10:14 AM
Thank you.
I see the keep alives, but I do not know when they are lost...
Thats why I am asking, if I can use the time or sequence number or anything to know when the keep alives are lost
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide