Solved: Re: EEM Applet to Monitor CPU - Page 2

francisco_1 · ‎04-14-2010

I need to what the PID using CPU utilization at 50%.

All i see on the syslog server is " %HA_EM-2-LOG: highcpu: HIGH CPU"

EEM doesnt tell me what process is consuming the CPU at 50%

Any ideas?

event manager applet highcpu
event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.10.1 get-type exact entry-op ge entry-val 50 poll-interval 5
action 1.0 cli command "enable"
action 2.0 cli command "show proc cpu sorted"
action 3.0 syslog priority critical msg "HIGH CPU"

Francisco.

francisco_1 · ‎04-15-2010

Joe,

On one of your C6500 WAN switches with EBGP peering with our ISP provider appears to have BGP hold timer expiring frequently during the day (really annoying) . During those events we are seeing very high CPU. We are not sure at this stage what's causing high CPU since we are logging lots a ACL's filling logging buffer quickly hence that's why we am asking for your assitant to capture what's causing high CPU.

The tcl script you have provided us we will definately use in prod but any chance you could modify the script to capture overall CPU utilization only at 50%?

Francisco.

Joe Clarke · ‎04-15-2010

Right now it does. That is, the script will not execute unless the OVERALL CPU usage is at or above 50%. It will then email all processes which have a five second CPU utilization above 0%. This way, you can get an idea of ALL CPU consumers that contributed to the 50% overall CPU utilization. I thought this is what you would want. However, if you want to only see processes at or above 50% on their own, I can do that, but I do not think the script will provide useful data at that point.

--

Please support CSC Helps Haiti

https://supportforums.cisco.com/docs/DOC-8895

https://supportforums.cisco.com

francisco_1 · ‎04-15-2010

Joe.

Thank you very much for your assistant.

I will test it on the weekend

Francisco

Joe Clarke · ‎04-15-2010

I made one modification which may help. In this version, the processes will be listed in descending order relative to their CPU consumption. Previously, the processes were listed by hash value.

--

Please support CSC Helps Haiti

https://supportforums.cisco.com/docs/DOC-8895

https://supportforums.cisco.com

Joe Clarke · ‎04-15-2010

Sorry, I had a typo in that last version. Try this one instead.

--

Please support CSC Helps Haiti

https://supportforums.cisco.com/docs/DOC-8895

https://supportforums.cisco.com

francisco_1 · ‎04-16-2010

Hey Joe,

I dont see much difference based on the output i am seeing on the syslog server!

The high cpu threshold "50" definately means 0.50% and higher not 50.00% and higher. The problem is we are receiving far too many syslogs mesaages for procceses using low CPU and we could easily missed out what we need to see!. Hence why i need the tcl script to trigger when CPU is at 50.00% or above. Not interested on any process using below 50.00%.

Any chance you could modify the script to do that? if not then no worries.

Thanks Joe.

Francisco.

Joe Clarke · ‎04-16-2010

The threshold is 50%. That is, the script will not run at all unless the five second CPU utilization of the device as a whole is greater than or equal to 50%. When that occurs, the script will parse the output of "show proc cpu sorted". For every process which has a non-zero five second CPU utilization value, that process name will be sent out via a syslog with its five second CPU utilization value. Again, the reason for this is that multuple processes could be contributing to the overall CPU utilization of 50%. There may not be one single process taking 50% or more CPU. In which case, if you only printed processes that had a 50% CPU utilization value, your syslog would have no processes.

Now, if you are only interested in processes which have a 50% utilization value, you need to change the design of your script from one that looks at the overall system CPU usage to one that runs periodically, parses the output of "show proc cpu sorted", and only sends a syslog when one or more processes are taking up at least 50% of the CPU. IS this what you want?

francisco_1 · ‎04-16-2010

Yes Joe. Only interested in processes which have a 50% utilization value

Francisco.

Joe Clarke · ‎04-16-2010

Try this policy instead. You will no longer need to set the high_cpu_cpu_id variable. It will only generate a syslog if there is at least one process with a CPU utilization value at or above the high_cpu_threshold value.

francisco_1 · ‎04-19-2010

Joe,

I have uploaded the script in to flash and trying to register it, i get error below.

R1(config)#event manager policy tm_alert_high_cpu.tcl
Compile check and registration failed:Wrong # args, usage is "::cisco::eem::event_register_timer watchdog|countdown|absolute|cron name ? cron_entry ? time ? queue_priority normal|low|high maxrun ? nice ?"
while executing
"::cisco::eem::event_register_timer watchdog time $high_cpu_poll_freq
"
Tcl policy execute failed: Wrong # args, usage is "::cisco::eem::event_register_timer watchdog|countdown|absolute|cron name ? cron_entry ? time ? queue_priority normal|low|high maxrun ? nice ?"

Embedded Event Manager configuration: failed to retrieve intermediate registration result for policy tm_alert_high_cpu.tcl: Unknown error 0
R1(config)#event manager policy tm_alert_high_cpu.tcl
Compile check and registration failed:Wrong # args, usage is "::cisco::eem::event_register_timer watchdog|countdown|absolute|cron name ? cron_entry ? time ? queue_priority normal|low|high maxrun ? nice ?"
while executing
"::cisco::eem::event_register_timer watchdog time $high_cpu_poll_freq
"
Tcl policy execute failed: Wrong # args, usage is "::cisco::eem::event_register_timer watchdog|countdown|absolute|cron name ? cron_entry ? time ? queue_priority normal|low|high maxrun ? nice ?"

Embedded Event Manager configuration: failed to retrieve intermediate registration result for policy tm_alert_high_cpu.tcl: Unknown error 0
R1(config)#event manager policy tm_alert_high_cpu.tcl

Joe Clarke · ‎04-19-2010

Your version of IOS may require another parameter. Try this version.

francisco_1 · ‎04-20-2010

PERFECTO. Excellent stuff Joe.

Another rating goes to you.

Thanks

Francisco.

leelove01 · ‎10-11-2010

Joe,

I'm in a similar position where it appears the "BGP Router" process may be running so frequently due our EBGP peer, whom we receieve the full routing table from. They constantly run some algorithm that changes their routes for optimal routing which seems to be causing them to send us updates ranging from 15-30 on average with spikes as high as 100+. On our other bgp router, it peers with two ISPs which it gets full routing tables from both. It has twice as many routes as the one in question but rarely has the same problem. Its bgp updates are only around 1-5 on average with maybe a spike up to 20-25. Nothing like the one in question. I was hoping to use EEM to get a better picture of what process is taking up the majority of the CPU when the cpu spikes to over 75% over the 5sec duration. I want to confirm that its the BGP Router process that is causing our problem. Whats making me look at bgp is that we use nagios as our NMS. We defined a service so that every 5 mins it runs a perl script to get information on all our devices interfaces. Periodically it will alert us that this router is not responding but 5 minutes later its ok. Now I know that I can adjust one setting in nagios that might stop it from alerting what I feel is a false alarm but I want to confirm if its the CPU spikes by the BGP Router process that stops the router from responding to nagios.

s72033-advipservicesk9_wan-mz.122-33.SXI1.bin <---- IOS currently running on our 7200 router.

Any help would be appreciated.

Lee

Joe Clarke · ‎10-11-2010

Please start a new thread for your issue.