11-09-2016 03:23 PM - edited 03-17-2019 08:38 AM
Hi Everyone,
Every night I'm getting an RTMT alert from my PUB CUCM 11.0.1 + 4xsubs
Processor load over configured threshold for configured duration of time . Configured high threshold is 90 % cmoninit (58 percent) uses most of the CPU.
Processor_Info:
For processor instance _Total: %CPU= 99, %User= 77, %System= 17, %Nice= 0, %Idle= 0, %IOWait= 5, %softirq= 1, %irq= 0.
For processor instance 0: %CPU= 99, %User= 77, %System= 17, %Nice= 0, %Idle= 0, %IOWait= 5, %softirq= 1, %irq= 0.
The alert is generated on Thu Nov 10 00:02:25 AEDT 2016 on node 10.99.63.51.
Memory_Info: %Mem Used= 65, %VM Used= 49.
Partition_Info:
Swap: %Disk Used=25.
Active: %Disk Used=92.
Common: %Disk Used=48.
Process_Info: processes with D-State: cmoninit#2
Thanks for your help in advance
Cam
11-09-2016 05:28 PM
Hi,
The system is complaining for high CPU.
Please check the following:
1. Make sure you are not using snapshots on CUCM as they tend to affect system performance.
2. If it happens every night then do you have any DRS backup, Directory Sync or any other activity on the server?
3. Can you check the NTP status on the CUCM. The recommended stratum value for NTP is less than 5. You can check by getting the output of utils ntp status.
4. Collect the event viewer system and application logs and look for any errors or alerts corresponding to the time of the high cpu alert.
5. Perform a system diagnostics during the normal hours and during the time when the system complains for high cpu to compare the two. You can use the command utils diagnose test.
6. Check the ESXi and see if there are any errors/alerts.
Aseem
11-09-2016 07:49 PM
Hi Aseem,
Thanks for your response.
- I've confirmed both ESXi Snapshots and backups, they only occur at 3:30am
- The DRS occurs each night at 1:00am. I even turned off the DRS one night and the issue still occured
- The NTP Stratum is 3
- I'll check the application logs and look for errors
I'll let you know
Thanks
Cam
11-09-2016 07:55 PM
Hi,
cmoninit is a database process, there can be a variety of reason this could happen.
but you can not use snapshot on CUCM, you would have to remove them to get to a supported configuration.
Also how many users do you have on CUCM, can you past the output of "Show status" from CLI for the server.
JB
11-09-2016 08:01 PM
Hi JB,
Thanks for your response
I agree with on the snapshot but the VMWare team take them anyway to then backup the system
See below the show status output
Executed command unsuccessfully
No valid command entered
admin:show status
Host Name : vicpccm01
Date : Thu Nov 10, 2016 14:57:15
Time Zone : Australian Eastern Daylight Time (Australia/Melbourne)
Locale : en_US.UTF-8
Product Ver : 11.0.1.20000-2
Unified OS Version : 6.0.0.0-2
Uptime:
14:57:16 up 106 days, 19:01, 1 user, load average: 0.16, 0.27, 0.34
CPU Idle: 93.75% System: 03.12% User: 03.12%
IOWAIT: 00.00% IRQ: 00.00% Soft: 00.00%
Memory Total: 5994288K
Free: 222780K
Used: 5771508K
Cached: 1761028K
Shared: 299552K
Buffers: 150132K
Total Free Used
Disk/active 14154228K 1125108K 12883992K (92%)
11-09-2016 08:10 PM
Hi,
Can you also paste "show hardware"
I understand the concern of Vmware team, but if you go to TAC for troubleshooting the first thing they will tell you is to remove snapshot.
You can setup DRS to backup your configuration which is a supported way to go.
JB
11-09-2016 08:21 PM
Hi JB,
See show hardware below
admin:show hard
admin:show hardware
HW Platform : VMware Virtual Machine
Processors : 1
Type : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
CPU Speed : 2900
Memory : 6144 MBytes
Object ID : 1.3.6.1.4.1.9.1.1348
OS Version : UCOS 6.0.0.0-2.i386
Serial Number : VMware
RAID Version :
No RAID controller information is available
BIOS Information :
PhoenixTechnologiesLTD 6.00 10/22/2013
RAID Details :
No RAID information is available
-----------------------------------------------------------------------
Physical device information
-----------------------------------------------------------------------
Number of Disks : 1
Hard Disk #1
Size (in GB) : 80
Partition Details :
Disk /dev/sda: 10443 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0
Device Boot Start End #sectors Id System
/dev/sda1 * 2048 29028351 29026304 83 Linux
Thanks
Cam
11-09-2016 08:30 PM
Hi Cam,
Can you tell me below.
JB
11-09-2016 08:35 PM
Hi JB,
At present there are only 166 phones on this system. We are migrating from another CUCM. Also attached are Unity, IM&P, Attendent Console and UCCX
Thanks
Cameron
11-09-2016 08:47 PM
Hi Cam,
You are well within the OVA specification for 2500 users
Cisco Unified Communications Manager (CUCM) configuration that supports up to 2500 users per node.
Details:
Red Hat Enterprise Linux 6 (64-bit)
CPU: 1 vCPU with 800 MHz reservation
Memory: 6 GB with 6 GB reservation
Disk: 1 - 80 GB disk
</Description>
</Configuration>
<Configuration ovf:default="true" ovf:id="CUCM_7500">
<Label>CUCM 7500 user node</Label>
<Description>
I would at this point first get rid of snapshot an monitor to see if that's what was causing the issue before completely migrating over from old CUCM
JB
11-09-2016 08:49 PM
Hi JB,
There are no snapshots on the vmware box at present
The backup works in such a way that the backup software takes a snapshot and after backup deletes the snapshot
Thanks
Cam
11-09-2016 09:00 PM
Hi Cam,
Are the alerts generated at the same time that process takes place?
JB
11-09-2016 09:07 PM
Hi JB,
The alerts take place at 00:00 and the backup occurs at 03:30
Thanks
Cam
11-09-2016 09:44 PM
Hi Cam,
You can try increasing CPU to 2 to see if that makes a difference, otherwise i would suggest to try collecting the below logs for the time period to find root cause.
JB
11-10-2016 03:33 PM
Hi JB,
Thanks for all your help, I'll get those logs and let you know
I'll also see if I can increase the CPU and see how I go
Thanks
Cam
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide