Unable to change CAD status - SDL Router Services declared dead. CSCud76724

richb1971 · ‎06-15-2016

Anyone else experienced this on CUCM 9.1.2.11900-12 ?

The bug says only affected on 8.56(2) but we've got exact same symptoms:

0x08498b81 in IntentionalAbort (reason=0x87009e8 "SDL Router Services declar
ed dead. This may be due to high CPU usage or blocked function. Attempting to re
start CTIManager.") at ProcessCTIProcMon.cpp:65

Jitender Bhandari · ‎06-15-2016

Hi,

can you explain your query in detail.

JB

richb1971 · ‎06-15-2016

Hi JB,

All our CCX agents were unable to change their CAD status yesterday for a few mins. Some digging found the CTI Manager service restarted on its own at the affected time. The CUCM core dumps suggests this is the bug ID above. This bug affects CUCM 8.6.2 but we have 9.1. Is this bug supposed to affect 9.1? Whats the fix (other than restart CTIManager). Can we update the bug details?

Thanks

Richard

Jitender Bhandari · ‎06-15-2016

Hi Rich,

you are running CUCM version 9.1.2.11900-12 (9.1(2)SU1) and i can see the BUG is fixed in the previous mainline release 9.1.2.10000-28. So it Should be fixed in you version for sure, if you can share the complete backtrace it might provide more information.

HTH

JB

richb1971 · ‎06-15-2016

Cheers JB

Heres the backtrace:

====================================
backtrace
===================================
#0 0x00c58246 in raise () from /lib/libc.so.6
#1 0x00c59c11 in abort () from /lib/libc.so.6
#2 0x08498b81 in IntentionalAbort (reason=0x87009e8 "SDL Router Services declar
ed dead. This may be due to high CPU usage or blocked function. Attempting to re
start CTIManager.") at ProcessCTIProcMon.cpp:65
#3 0x08498c9c in CMProcMon::verifySdlTimerServices () at ProcessCTIProcMon.cpp:
573
#4 0x08499948 in CMProcMon::callManagerMonitorThread (cmProcMon=0x9a68bf8) at P
rocessCTIProcMon.cpp:330
#5 0x00b83ca7 in ACE_OS_Thread_Adapter::invoke (this=0x9fc9490) at OS_Thread_Ad
apter.cpp:94
#6 0x00b39541 in ace_thread_adapter (args=0x9fc9490) at Base_Thread_Adapter.cpp
:137
#7 0x00240791 in start_thread () from /lib/libpthread.so.0
#8 0x00d069ae in clone () from /lib/libc.so.6
====================================

Rich

Jitender Bhandari · ‎06-15-2016

Hi Rich,

I can tell the core is an IntentionalAbort , Core dumps that include the "IntentionalAbort" statement indicate a system resource issue that was responsible for the service fault. check the link below which talks about troubleshooting this type of issue in detail.

https://supportforums.cisco.com/document/56631/troubleshooting-core-dumps#Performing_Core_Analysis

On surface, you can make sure below is good.

Server is build with recommended OVA specification.
NO VM snapshots.

if the above is good i would recommend collect the below logs and open TAC case.

. 1 Collect all CallManager SDI and SDL traces in detail from all the nodes. Make sure they include the time window where the core dump got generated. Even though the core dump belongs to a different service like CTI Manager we need the CallManager traces.

If the core dump belongs to another service, like CTI Manager collect the traces for the offending service.
Collect the application and System logs from the syslog viewer with RTMT
Collect RIS Data collector performance logs.
Collect ProgLogs, to see if there's any file available. Proglogs may not have been written, however we need to confirm the presence of proglogs during the time of the failure.
Perform a utils core analyze and gather all the output
Download the core dump with RTMT to the customer computer. Zip the core dump, and attach it to the case notes via the web page since it will be likely a large file.
Collect server info
Show version active
Show version inactive
Show status
Show hardware

From internal BUG scrub i see some related BUGs, but looking at logs would be better way to go.

HTH

JB

richb1971 · ‎06-15-2016

Thanks JB. OVA looks good and no snapshots. I've reported to TAC and will reply with results.

Rich

richb1971 · ‎06-21-2016

TAC have said it may be that bug but not enough info to confirm (despite logs provided).

:(