cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
336
Views
0
Helpful
1
Replies

Cannot Find Cause of CUCM CPU Pegging

Hello,

Cisco 11.5.1.23900-30 - 1 Pub, 1 Sub, 2 IM&P (Pub-Sub) on this cluster.  

Here is the pattern of what happens every day now:

When I get up in the morning I see which node (Pub or Sub) is "hung" from CPU Pegging and reboot it from Vsphere.  It takes about 25-30 minutes to settle down.  Both Pub and Sub are then quiet throughout the work day and in the normal range of 280-450 MHz of CPU usage.  I have a Cisco TAC SR open on this but so far, their suggestions (changing the LDAP interval to weekly) have not helped.  

The types of HWM/LWM alerts like the one below start to increase as the day progresses:

At Fri Jul 05 14:44:21 EDT 2024 on node PSICMSUBPA01.MESSAGING.DOM, the following SyslogSeverityMatchFound events generated: 

SeverityMatch : Critical

MatchedEvent : Jul  5 14:44:03 PSICMSUBPA01 local7 2 LpmTool: 2: PSICMSUBPA01.MESSAGING.DOM: Jul 05 2024 18:44:03.400 UTC :  %UC_LPMTCT-2-LogPartitionHighWaterMarkExceeded: %[UsedDiskSpace=23][MessageString=Common Disk utilization hits HWM!! Purging files...][AppID=Cisco Log Partition Monitoring Tool][ClusterID=][NodeID=PSICMSUBPA01]: The percentage of used disk space in the log partition has exceeded the configured high water mark.

AppID : Cisco Syslog Agent

ClusterID : 

NodeID : PSICMSUBPA01

 TimeStamp : Fri Jul 05 14:44:03 EDT 2024

I now have the HWM/LWM set to the lowest parameters.  CPU pegging starts at approx. 12:00 AM which is also when the backup (DRS) is scheduled to run.  There is a then a lull until about 3:00 AM and constant alerts about HWM and LWM alerts start up again and then about 5:00 AM the CPU pegging starts again and doesn’t stop until the server gets hung and I have to restart it from Vsphere.

I have the HWM/LWM settings in RTMT 

CPU pegging alert ex:

Processor load over configured threshold for configured duration of time . Configured high threshold is 91 % ccm (69 percent) uses most of the CPU. 

 

Processor_Info: 

 

 For processor instance _Total: %CPU= 99, %User= 67, %System= 32, %Nice= 0, %Idle= 0, %IOWait= 0, %softirq= 1, %irq= 0. 

 

 For processor instance 0: %CPU= 99, %User= 67, %System= 32, %Nice= 0, %Idle= 0, %IOWait= 0, %softirq= 1, %irq= 0. 

 

The alert is generated on Fri Jul 05 05:01:21 EDT 2024 on node PSICMPUBPA01.MESSAGING.DOM. 

   Memory_Info: %Mem Used= 83, %VM Used= 57. 

 Partition_Info: 

Common: %Disk Used=67. 

Swap: %Disk Used=18. 

Active: %Disk Used=95. 

 Process_Info: processes with D-State:

If anyone has any suggestions, I would greatly appreciate it.  

1 Reply 1

Andrew Skelly
Level 7
Level 7

From the offending server, have you logged in to CLI and run either of these commands?

show process using-most cpu

show process using-most memory

Those will let you know which process is consuming high amounts of CPU and memory.  Would be a good place to start looking.

Please rate helpful posts by clicking the thumbs up!