Call mananger question

Jay Cambell · ‎06-04-2017

I'm running call manager 9.1.1 and unity 9.1. I keep running into issues where the phones keep rebooting randomly at different locations. I have checked the routers and switch ports. I can't find anything wrong. There''s no network drop off to cause the phones to starting registering. I can't figure out why the phones reboot. Can someone please point me into right direction on how to resolve the issue? I will go to version 11 of call manager in a few months.

Jaime Valencia · ‎06-04-2017

Have you reviewed the phone logs for hints of what might be happening?

Are they local? or over WAN?

HTH

java

if this helps, please rate

Jay Cambell · ‎06-05-2017

Yes, it's both local and WAN. What do you mean phones log? I've check the router logs but not each phone log.

Jaime Valencia · ‎06-05-2017

Enable web access on the devices, and go to their IP, you can get the logs from there.

HTH

java

if this helps, please rate

kumardilip · ‎05-25-2018

We have this problem as well, few of our sites with 79xx phones started rebooting randomly.

Maren Mahoney · ‎05-28-2018

As Jaime suggested to the OP, can you post the logs from a phone that has rebooted randomly?

kumardilip · ‎05-30-2018

Logs attached from one of the phones, timezone is JST. Looking at the logs from the switch, the time the phone rebooted would likely be between 18:39:30 and 18:39:45 JST.

May 28 09:39:43.250 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet6/0/18, changed state to down

May 28 09:39:44.257 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet6/0/18, changed state to down

May 28 09:39:46.681 UTC: %SWITCH_QOS_TB-5-TRUST_DEVICE_LOST: cisco-phone no longer detected on port Gi6/0/18, operational port trust state is now untrusted.

May 28 09:39:46.731 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet6/0/18, changed state to up

May 28 09:39:47.738 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet6/0/18, changed state to up

May 28 09:40:00.337 UTC: %SWITCH_QOS_TB-5-TRUST_DEVICE_DETECTED: cisco-phone detected on port Gi6/0/18, port's configured trust state is now operational.

May 28 09:40:01.352 UTC: %SWITCH_QOS_TB-5-TRUST_DEVICE_DETECTED: cisco-phone detected on port Gi6/0/18, port's configured trust state is now operational.

Maren Mahoney · ‎05-30-2018

The important information is in these lines:

29: ERR 18:38:55.942927 =====================

30: ERR 18:38:55.943360 Core of CNU 4.1 (0.1)

31: ERR 18:38:55.943820 Kernel reboot cause -> Bugtrap

32: ERR 18:38:55.944280 Trap code -> 0x20

33: ERR 18:38:55.944726 Kernel reboot time: Mon May 28 18:38:51 2018

34: ERR 18:38:55.945153 =====================

I do not see the 0x20 bug trap code in Cisco's Bug Track. But you've hit a bug of some kind. If you can contact TAC, they may have a more specific answer. Otherwise I'd say you need different firmware. What firmware version are you running?

kumardilip · ‎06-06-2018

This is from the phone's webpage.

App Load ID		jar42sccp.9-4-2ES9.sbn
Boot Load ID		tnp42.8-3-1-21a.bin
Version		*SCCP42.9-4-2SR1-1S*
Hardware Revision		12.0
Model Number		CP-7942G

Call Manager Info-

System version: 10.5.2.14901-1

Maren Mahoney · ‎06-07-2018

Roger that. It might be worthwhile to try upgrading one of the affected phones to a newer/different firmware to see if that resolves the problem. I looked again to see if I could find a reference to a 0x20 bug for phones or to CUCM with no luck. I think it is time to call TAC....

Maren

kumardilip · ‎06-07-2018

We have a case already opened with TAC. Things are moving slower than usual because this issue is as weird as it can get. As of now, they are trying to find if, it is the port going down first causing the phone to reboot or, if the phone reboots causing the port to go down/up on the switch. Because this is happening in multiple sites, and just with the SCCP phones, I am sure this has something to do with the most common factor here, the CUCM. But let's see, I could be wrong. I will update the forum again when we make any kind of progress with this issue.

Maren Mahoney · ‎06-07-2018

The other common factor would be firmware, so I do encourage you to update one affected phone to see if that fixes the problem.

Good luck to you! And, yes please, when you do finally figure out what is going on I would LOVE to know what the underlying problem was.

Maren

Ratheesh Kumar · ‎05-28-2018

Hi there

I would also curious to see what phone logs are saying. There could be several reasons. If its TCP timeout, it will be most likely caused by network issues. Sometimes a cluster reboot helps if your server has some sort of bug or memory leak. I also assume that the limit of the number of phones in phone subnet is in the recommended as sometimes ARP tables fills up and drops the connection.

There is also a feature called Geometric TCP

Geometric TCP

The Cisco Unified IP Phone firmware 7.2(1) introduced a Geometric TCP mechanism to permit IP Phones to measure the round-trip delay between the IP Phone and Unified CM, then adapt the keepalive timeout value. This provided a very accurate failover mechanism when the network delay is consistent.

However, if the network delay is inconsistent, this mechanism may cause the IP Phones to inaccurately attempt failover. The Cisco Unified IP Phone firmware 8.4(2) introduces the ability for the Network Administrator to disable this behavior, if necessary, through the Detect Unified CM Connection Failure parameter defined on the IP Phone device configuration. The default value is Normal; this Geometric TCP mechanism can be disabled if the parameter is set to Delayed.

Hope this Helps

Cheers
Rath!

***Please rate helpful posts***