Hi JB,

cameronallison1 · ‎11-09-2016

Hi Everyone,

Every night I'm getting an RTMT alert from my PUB CUCM 11.0.1 + 4xsubs

Processor load over configured threshold for configured duration of time . Configured high threshold is 90 % cmoninit (58 percent) uses most of the CPU.

Processor_Info:

For processor instance _Total: %CPU= 99, %User= 77, %System= 17, %Nice= 0, %Idle= 0, %IOWait= 5, %softirq= 1, %irq= 0.

For processor instance 0: %CPU= 99, %User= 77, %System= 17, %Nice= 0, %Idle= 0, %IOWait= 5, %softirq= 1, %irq= 0.

The alert is generated on Thu Nov 10 00:02:25 AEDT 2016 on node 10.99.63.51.

Memory_Info: %Mem Used= 65, %VM Used= 49.

Partition_Info:

Swap: %Disk Used=25.

Active: %Disk Used=92.

Common: %Disk Used=48.

Process_Info: processes with D-State: cmoninit#2

Thanks for your help in advance

Cam

Aseem Anand · ‎11-09-2016

Hi,

The system is complaining for high CPU.

Please check the following:

1. Make sure you are not using snapshots on CUCM as they tend to affect system performance.

2. If it happens every night then do you have any DRS backup, Directory Sync or any other activity on the server?

3. Can you check the NTP status on the CUCM. The recommended stratum value for NTP is less than 5. You can check by getting the output of utils ntp status.

4. Collect the event viewer system and application logs and look for any errors or alerts corresponding to the time of the high cpu alert.

5. Perform a system diagnostics during the normal hours and during the time when the system complains for high cpu to compare the two. You can use the command utils diagnose test.

6. Check the ESXi and see if there are any errors/alerts.

Aseem

cameronallison1 · ‎11-09-2016

Hi Aseem,

Thanks for your response.

- I've confirmed both ESXi Snapshots and backups, they only occur at 3:30am

- The DRS occurs each night at 1:00am. I even turned off the DRS one night and the issue still occured

- The NTP Stratum is 3

- I'll check the application logs and look for errors

I'll let you know

Thanks

Cam

Jitender Bhandari · ‎11-09-2016

Hi,

cmoninit is a database process, there can be a variety of reason this could happen.

but you can not use snapshot on CUCM, you would have to remove them to get to a supported configuration.

Also how many users do you have on CUCM, can you past the output of "Show status" from CLI for the server.

JB

cameronallison1 · ‎11-09-2016

Hi JB,

Thanks for your response

I agree with on the snapshot but the VMWare team take them anyway to then backup the system

See below the show status output

Executed command unsuccessfully
No valid command entered
admin:show status

Host Name          : vicpccm01
Date               : Thu Nov 10, 2016 14:57:15
Time Zone          : Australian Eastern Daylight Time (Australia/Melbourne)
Locale             : en_US.UTF-8
Product Ver        : 11.0.1.20000-2
Unified OS Version : 6.0.0.0-2

Uptime:
14:57:16 up 106 days, 19:01, 1 user, load average: 0.16, 0.27, 0.34

CPU Idle:   93.75% System:   03.12%    User:   03.12%
IOWAIT:   00.00%     IRQ:   00.00%    Soft:   00.00%

Memory Total:        5994288K
        Free:         222780K
        Used:        5771508K
      Cached:        1761028K
      Shared:         299552K
     Buffers:         150132K

                        Total            Free            Used
Disk/active         14154228K        1125108K       12883992K (92%)

Jitender Bhandari · ‎11-09-2016

Hi,

Can you also paste "show hardware"

I understand the concern of Vmware team, but if you go to TAC for troubleshooting the first thing they will tell you is to remove snapshot.

You can setup DRS to backup your configuration which is a supported way to go.

JB

cameronallison1 · ‎11-09-2016

Hi JB,

See show hardware below

admin:show hard
admin:show hardware

HW Platform       : VMware Virtual Machine
Processors        : 1
Type              : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
CPU Speed         : 2900
Memory            : 6144 MBytes
Object ID         : 1.3.6.1.4.1.9.1.1348
OS Version        : UCOS 6.0.0.0-2.i386
Serial Number     : VMware

RAID Version      :
No RAID controller information is available

BIOS Information :
PhoenixTechnologiesLTD 6.00 10/22/2013

RAID Details      :
No RAID information is available
-----------------------------------------------------------------------
Physical device information
-----------------------------------------------------------------------
Number of Disks   : 1
Hard Disk #1
Size (in GB)      : 80

Partition Details :

Disk /dev/sda: 10443 cylinders, 255 heads, 63 sectors/track
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors Id System
/dev/sda1   *      2048 29028351   29026304 83 Linux

Thanks

Cam

Jitender Bhandari · ‎11-09-2016

Hi Cam,

Can you tell me below.

How many nodes in CUCM cluster.
How many users \ phone in total in your CUCM.

JB

cameronallison1 · ‎11-09-2016

Hi JB,

At present there are only 166 phones on this system. We are migrating from another CUCM. Also attached are Unity, IM&P, Attendent Console and UCCX

Thanks

Cameron

Jitender Bhandari · ‎11-09-2016

Hi Cam,

You are well within the OVA specification for 2500 users

Cisco Unified Communications Manager (CUCM) configuration that supports up to 2500 users per node.
Details:
Red Hat Enterprise Linux 6 (64-bit)
CPU: 1 vCPU with 800 MHz reservation
Memory: 6 GB with 6 GB reservation
Disk: 1 - 80 GB disk
      </Description>
    </Configuration>
    <Configuration ovf:default="true" ovf:id="CUCM_7500">
      <Label>CUCM 7500 user node</Label>
      <Description>

I would at this point first get rid of snapshot an monitor to see if that's what was causing the issue before completely migrating over from old CUCM

JB

cameronallison1 · ‎11-09-2016

Hi JB,

There are no snapshots on the vmware box at present

The backup works in such a way that the backup software takes a snapshot and after backup deletes the snapshot

Thanks

Cam

Jitender Bhandari · ‎11-09-2016

Hi Cam,

Are the alerts generated at the same time that process takes place?

JB

cameronallison1 · ‎11-09-2016

Hi JB,

The alerts take place at 00:00 and the backup occurs at 03:30

Thanks

Cam

Jitender Bhandari · ‎11-09-2016

Hi Cam,

You can try increasing CPU to 2 to see if that makes a difference, otherwise i would suggest to try collecting the below logs for the time period to find root cause.

Cisco Informix Database Service
Event viewer application logs
Event viewer system logs
RisDC perfmon logs

JB

cameronallison1 · ‎11-10-2016

Hi JB,

Thanks for all your help, I'll get those logs and let you know

I'll also see if I can increase the CPU and see how I go

Thanks

Cam

cmoninit High CPU v11.0.1