Hello,I have a deployment with CUCM 8.5 (virtualized on UCS) and a bunch of 6921 and 7962 SCCP phones.
We're experiencing "random" unregistering of the phones. In SDL logs we see:
SDL001_100_000083.txt:039614265 |2013/05/18 07:05:56.192|100|SdlSig|StationClose|restart0
|[R:V-H:0,N:0,L:0,V:0,Z:0,D:0] CloseStationReason = 8 StationId = SDL001_100_000083.txt:039614267 |2013/05/18 07:05:56.192|100|AlarmErr||||||
AlarmClass:CallManager, AlarmName: EndPointUnregistered, AlarmSeverity: Error, AlarmMessage: , AlarmDescription: An endpoint has unregistered, AlarmParameters:
DeviceName:SEPC40ACB4C3A84,IPAddress:10.200.102.233, Protocol:SCCP, DeviceType:404, Description:XXXX - PO, Reason:6, IPAddrAttributes:3,
According to this doc
I"read" this trace as:
- Device initiated reset
- Device Type: 404 - 7962 Phone
- Reason 6: connectivity error
Anyway, about Error: 6 I read ConnectivityError - The network connection between the device and Unified CM dropped before
the device was fully registered. Possible causes include device power outage, network power outage,
network configuration error, network delay, packet drops and packet corruption. It is also possible
to get this error if the Unified CM node is experiencing high CPU usage.
Verify that the device is powered up andoperating,
verify that there is network connectivity between the device and Unified CM,
and verify the CPU utilization is in the safe range (
you can monitor this via the CPU Pegging Alert in RTMT). I've verified:
- device powered up and operating: OK
- network connectivity: seems good (when I've checked, do not know if there are temporary failures)
- CPU use: OK both on CUCM and UCS node
Any help would be greatly appreciated
thanks in advance
To give further elements, on SDL logs I found also similar events with a different reason code, 13.
According to official docs (Cisco Unified Communications Manager Managed Services Guide, Release 8.5(1))
code 13 is related to KeepAliveTimeout:
KeepAliveTimeout—A KeepAlive message was not received. Possible causes include device
power outage, network power outage, network configuration error, network delay, packet
drops and packet corruption. It is also possible to get this error if the Unified CM node is
experiencing high CPU usage. Verify the device is powered up and operating, verify network
connectivity between the device and Unified CM, and verify the CPU utilization is in the safe
range (this can be monitored using RTMT via CPU Pegging Alert). No action necessary, the
device will reregister automatically
Everything seems to point to a network issue, but phones are directly connected to a 4507 operating normally without errors on interfaces (average CPU 20%, with peaks of 90%)
As Paolo says these are normally network problems.
However make sure the Large Receive Offload (LRO) is disabled in ESXi as that causes network problems.
thanks Paolo and Graham for your help: I've checked settings on ESXi host, and I've seen thatTCP LRO is enabled. On monday we'll change these settings and see what happens. I'll update this thread when done (and eventually rate proposed solution ).
Thanks a lot for your help