I'm running call manager 9.1.1 and unity 9.1. I keep running into issues where the phones keep rebooting randomly at different locations. I have checked the routers and switch ports. I can't find anything wrong. There''s no network drop off to cause the phones to starting registering. I can't figure out why the phones reboot. Can someone please point me into right direction on how to resolve the issue? I will go to version 11 of call manager in a few months.
Have you reviewed the phone logs for hints of what might be happening?
Are they local? or over WAN?
Enable web access on the devices, and go to their IP, you can get the logs from there.
Logs attached from one of the phones, timezone is JST. Looking at the logs from the switch, the time the phone rebooted would likely be between 18:39:30 and 18:39:45 JST.
May 28 09:39:43.250 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet6/0/18, changed state to down
May 28 09:39:44.257 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet6/0/18, changed state to down
May 28 09:39:46.681 UTC: %SWITCH_QOS_TB-5-TRUST_DEVICE_LOST: cisco-phone no longer detected on port Gi6/0/18, operational port trust state is now untrusted.
May 28 09:39:46.731 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet6/0/18, changed state to up
May 28 09:39:47.738 UTC: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet6/0/18, changed state to up
May 28 09:40:00.337 UTC: %SWITCH_QOS_TB-5-TRUST_DEVICE_DETECTED: cisco-phone detected on port Gi6/0/18, port's configured trust state is now operational.
May 28 09:40:01.352 UTC: %SWITCH_QOS_TB-5-TRUST_DEVICE_DETECTED: cisco-phone detected on port Gi6/0/18, port's configured trust state is now operational.
The important information is in these lines:
29: ERR 18:38:55.942927 =====================
30: ERR 18:38:55.943360 Core of CNU 4.1 (0.1)
31: ERR 18:38:55.943820 Kernel reboot cause -> Bugtrap
32: ERR 18:38:55.944280 Trap code -> 0x20
33: ERR 18:38:55.944726 Kernel reboot time: Mon May 28 18:38:51 2018
34: ERR 18:38:55.945153 =====================
I do not see the 0x20 bug trap code in Cisco's Bug Track. But you've hit a bug of some kind. If you can contact TAC, they may have a more specific answer. Otherwise I'd say you need different firmware. What firmware version are you running?
This is from the phone's webpage.
|App Load ID||jar42sccp.9-4-2ES9.sbn|
|Boot Load ID||tnp42.8-3-1-21a.bin|
Call Manager Info-
Roger that. It might be worthwhile to try upgrading one of the affected phones to a newer/different firmware to see if that resolves the problem. I looked again to see if I could find a reference to a 0x20 bug for phones or to CUCM with no luck. I think it is time to call TAC....
We have a case already opened with TAC. Things are moving slower than usual because this issue is as weird as it can get. As of now, they are trying to find if, it is the port going down first causing the phone to reboot or, if the phone reboots causing the port to go down/up on the switch. Because this is happening in multiple sites, and just with the SCCP phones, I am sure this has something to do with the most common factor here, the CUCM. But let's see, I could be wrong. I will update the forum again when we make any kind of progress with this issue.
The other common factor would be firmware, so I do encourage you to update one affected phone to see if that fixes the problem.
Good luck to you! And, yes please, when you do finally figure out what is going on I would LOVE to know what the underlying problem was.
I would also curious to see what phone logs are saying. There could be several reasons. If its TCP timeout, it will be most likely caused by network issues. Sometimes a cluster reboot helps if your server has some sort of bug or memory leak. I also assume that the limit of the number of phones in phone subnet is in the recommended as sometimes ARP tables fills up and drops the connection.
There is also a feature called Geometric TCP
The Cisco Unified IP Phone firmware 7.2(1) introduced a Geometric TCP mechanism to permit IP Phones to measure the round-trip delay between the IP Phone and Unified CM, then adapt the keepalive timeout value. This provided a very accurate failover mechanism when the network delay is consistent.
However, if the network delay is inconsistent, this mechanism may cause the IP Phones to inaccurately attempt failover. The Cisco Unified IP Phone firmware 8.4(2) introduces the ability for the Network Administrator to disable this behavior, if necessary, through the Detect Unified CM Connection Failure parameter defined on the IP Phone device configuration. The default value is Normal; this Geometric TCP mechanism can be disabled if the parameter is set to Delayed.
Hope this Helps
***Please rate helpful posts***