Re: SCCP phones randomly unregistering

Gianstefano Monni · ‎05-18-2013

Hello,I have a deployment with CUCM 8.5 (virtualized on UCS) and a bunch of 6921 and 7962 SCCP phones.

We're experiencing "random" unregistering of the phones. In SDL logs we see:

SDL001_100_000083.txt:039614265 |2013/05/18 07:05:56.192|100|SdlSig|StationClose|restart0
|StationD(1,100,51,55248)|StationInit(1,100,50,1)|1,100,50,1.45256839^{^10.200.102.233}^SEPC40ACB4C3A84
|[R:V-H:0,N:0,L:0,V:0,Z:0,D:0] CloseStationReason = 8 StationId =


SDL001_100_000083.txt:039614267 |2013/05/18 07:05:56.192|100|AlarmErr||||||
AlarmClass:CallManager, AlarmName: EndPointUnregistered, AlarmSeverity: Error, AlarmMessage: ,
AlarmDescription: An endpoint has unregistered, AlarmParameters: 
DeviceName:SEPC40ACB4C3A84,IPAddress:10.200.102.233, Protocol:SCCP,
DeviceType:404, Description:XXXX - PO, Reason:6, IPAddrAttributes:3, 
AppID:Cisco CallManager,ClusterID:StandAloneCluster,NodeID:cucm-pub,


According to this doc 
I"read" this trace as:

- Device initiated reset 
- Device Type: 404 - 7962 Phone
- Reason 6: connectivity error
- IPAddrAttribues

Anyway, about Error: 6 I read

ConnectivityError - The network connection between the device and Unified CM dropped before 
the device was fully registered. Possible causes include device power outage, network power outage, 
network configuration error, network delay, packet drops and packet corruption. It is also possible 
to get this error if the Unified CM node is experiencing high CPU usage. 
Verify that the device is powered up andoperating, 
verify that there is network connectivity between the device and Unified CM, 
and verify the CPU utilization is in the safe range (
you can monitor this via the CPU Pegging Alert in RTMT).

I've verified:

- device powered up and operating: OK
- network connectivity: seems good (when I've checked, do not know if there are temporary failures)
- CPU use: OK both on CUCM and UCS node

Any help would be greatly appreciated

thanks in advance

Gianstefano Monni · ‎05-18-2013

To give further elements, on SDL logs I found also similar events with a different reason code, 13.

According to official docs (Cisco Unified Communications Manager Managed Services Guide, Release 8.5(1))

code 13 is related to KeepAliveTimeout:

KeepAliveTimeout—A KeepAlive message was not received. Possible causes include device

power outage, network power outage, network configuration error, network delay, packet

drops and packet corruption. It is also possible to get this error if the Unified CM node is

experiencing high CPU usage. Verify the device is powered up and operating, verify network

connectivity between the device and Unified CM, and verify the CPU utilization is in the safe

range (this can be monitored using RTMT via CPU Pegging Alert). No action necessary, the

device will reregister automatically

Everything seems to point to a network issue, but phones are directly connected to a 4507 operating normally without errors on interfaces (average CPU 20%, with peaks of 90%)

paolo bevilacqua · ‎05-18-2013

Check you network, as these are normally cause by micro-interruptios or other issues at layer 2 level.

Graham Old · ‎05-18-2013

As Paolo says these are normally network problems.

However make sure the Large Receive Offload (LRO) is disabled in ESXi as that causes network problems.

http://docwiki.cisco.com/wiki/Disable_LRO

Graham

Gianstefano Monni · ‎05-18-2013

thanks Paolo and Graham for your help: I've checked settings on ESXi host, and I've seen thatTCP LRO is enabled. On monday we'll change these settings and see what happens. I'll update this thread when done (and eventually rate proposed solution ).

Thanks a lot for your help

Gianstefano