EEM TCL script configuration issue

DIana Elizabeth Lara Zarate · ‎09-20-2013

Hi Experts,

I need help with an EEM TCL script for the CRS platform that generates a SYSLOG message after the CPU reaches a threshold value and then stays over the threshold value for 15 minutes, I've already tryied several thing and the last TCL script that I tested generated the SYSLOG message when the CPU reaches the threshold but I can't seem to find any way to make it wait the 15 min over the threshold and then generate the message.

My current script looks like this:

::cisco::eem::event_register_wdsysmon timewin 900 sub1 cpu_tot op ge val 70

namespace import ::cisco::eem::*

namespace import ::cisco::lib::*

array set event_details [event_reqinfo]

action_syslog msg "sub1 is $event_details(sub1)"

action_syslog msg "High CPU threshold value over 70%"

puts ok

I've tryied using the 'period' option for the 'cpu_tot' variable but the TCL script was'nt recognized and couldn't be registered, and I'm using the 'timewin' option here but it seems to be wrong as it says it's the time it has for multiple sub-events to ocurr in order for the script to execute.

timewin

(Optional) Time window within which all of the subevents have to occur in order for an event to be generated and is specified in SSSSSSSSSS[.MMM] format. SSSSSSSSSS format must be an integer representing seconds between 0 and 4294967295 inclusive. MMM format must be an integer representing milliseconds between 0 and 999).

Also, the 'period' option I believe wouldn't have worked because I understand that it referrs to the time period that the script will take to monitor the CPU:

•1. cpu_tot [op gt|ge|eq|ne|lt|le] [val ?] [period ?]

op	(Optional) Comparison operator that is used to compare the collected total system CPU usage sample percentage with the specified percentage value. If true, an event is raised.
val	(Optional) Percentage value in which the average CPU usage during the sample period is compared.
period	(Optional) Time period for averaging the collection of samples and is specified in SSSSSSSSSS[.MMM] format. SSSSSSSSSS format must be an integer representing seconds between 0 and 4294967295, inclusive. MMM format must be an integer representing milliseconds between 0 and 999. If this argument is not specified, the most recent sample is used.

As I said, I couldn't try this because the script send an error when I tried to register using the following line:

::cisco::eem::event_register_wdsysmon sub1 cpu_tot op ge val 70 period 900

This is the error message that appeared:

RP/0/RP0/CPU0:CRS(config)#event manager policy test.tcl username cisco
RP/0/RP0/CPU0:CRS(config)#commit
Thu Aug 29 12:35:43.569 CDT

% Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted. Please issue 'show configuration failed' from this session to view the errors
RP/0/RP0/CPU0:CRS(config)#sh conf fail
Thu Aug 29 12:35:52.427 CDT
!! SEMANTIC ERRORS: This configuration was rejected by
!! the system due to semantic errors. The individual
!! errors with each failed configuration command can be
!! found below.

event manager policy test.tcl username cisco persist-time 3600
!!% Embedded Event Manager configuration: failed to retrieve intermediate registration result for policy test.tcl
end

Anyway, to make this work I understand that I need nested TCL scripts that do the following:

•1. Monitor the CPU and when it reaches the threshold install another TCL policy that counts down 15 min.
•2. If the second TCL policy reaches zero then it should generate the SYSLOG message.
•3. Monitor the CPU while this is running and if it falls below the threshold it should stop the second TCL policy.

I don't know how I can acomplish this so if anyone can help me with this or show me another way to do this I would really appreciate it.

Thanks in advance for all your help!

Joe Clarke · ‎09-21-2013

Neither option is likely to do what you want. The timewin is for correlating multiple events, and period is the polling interval. What you want is to create a timer when the CPU is first detected as being high, countdown 15 minutes, then alert you. You can do this with a nested EEM policy. For example, you can add the following to your existing policy:

proc get_pol_dir { fd } {

set res {}

set output [cli_exec $fd "show event manager directory user policy"]

set output [string trim $output]

regsub -all "\r\n" $output "\n" result

set lines [split $result "\n"]

foreach line $lines {

if { $line == "" } {

continue

}

if { ! [regexp {\s} $line] && ! [regexp {#$} $line] } {

set res $line

break

}

if { $res == {} } {

return -code error "The user policy directory has not been configured"

}

return $res

}

if { [catch {cli_open} result] } {

error $result $errorInfo

}

array set cli $result

set output [cli_exec $cli(fd) "show event manager policy registered | inc tm_alert_high_cpu.tcl"]

if { [regexp {tm_alert_high_cpu.tcl} $output] } {

exit 0

}

set poldir [get_pol_dir $cli(fd)]

set polname "${poldir}/tm_alert_high_cpu.tcl"

set fd [open $polname "w"]

puts $fd "::cisco::eem::event_register_timer countdown time 900"

puts $fd "namespace import ::cisco::eem::*"

puts $fd "namespace import ::cisco::lib::*"

puts $fd "action_syslog msg \"CPU has been over 70% for 15 minutes\""

close $fd

cli_exec $cli(fd) "config t"

cli_exec $cli(fd) "event manager policy tm_lert_high_cpu.tcl username eem"

cli_exec $cli(fd) "commit"

cli_exec $cli(fd) "end"

catch {cli_close $cli(fd) $cli(tty_id)}

###

Additionally, you'll want another permanently configured policy that checks for a low CPU threshold. Something like:

::cisco::eem::event_register_wdsysmon sub1 cpu_tot op le val 10