cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1394
Views
10
Helpful
4
Replies

EEM script fails to run when CPU is very high

ffacilities
Level 1
Level 1

Hi all,

We are doing T.37 fax on 2811 and 2911 routers, with calls coming in over two T1 links.  Occasionally the unit hits a bug where a call gets stuck and the CPU gradually rises up to near 100 over about 45 minutes.  It stays close to 100%, almost all in the DocMSP process, with the unit rejecting all incoming calls until something tears down the stuck call when everything returns to normal.

Cisco TAC have been unable to identify or fix the bug, so we have implemented an EEM script to detect the high CPU and bounce the two T1 links.  Here is the script, triggered on the call rejection logs:

event manager applet high_cpu_recovery
 event syslog pattern "IVR-3-LOW_CPU_RESOURCE"
 action 1.0 syslog msg "----HIGH CPU DETECTED, BOUNCING T1s----"
 action 2.0 cli command "enable"
 action 3.0 cli command "show clock | append flash:high_cpu_recovery.txt"
 action 4.0 cli command "show call active fax brief | append flash:high_cpu_recovery.txt"
 action 5.0 cli command "config t"
 action 5.1 cli command "controller t1 1/0"
 action 5.2 cli command "shutdown"
 action 5.3 cli command "controller t1 1/1"
 action 5.4 cli command "shutdown"
 action 5.5 cli command "controller t1 1/0"
 action 5.6 cli command "no shutdown"
 action 5.7 cli command "controller t1 1/1"
 action 5.8 cli command "no shutdown"
 action 5.9 cli command "end"

The script seems to work fine functionally (tested by having it trigger off a user-defined log event instead of the high CPU event), but it seems that when the CPU is very high the script definitely gets triggered but often just doesn't seem to run.  30 minutes or an hour later, it still hasn't bounced the T1 links.

We have the following config line attempting to give more priority to the EEM script, but it doesn't seem to be helping much:

scheduler allocate 40000 5000

I have also seen mention of a 'scheduler interval' command to allow time for low-priority processes, but that doesn't seem to be available on this platform.

Any suggestions for other ways to give more priority to the EEM script, or better values for the 'scheduler allocate' command? 

Thanks,

Ollie

4 Replies 4

Joe Clarke
Cisco Employee
Cisco Employee

It could be that the script is hitting its maxrun timer when the router is very heavily loaded.  Try adding "maxrun 60" to the end of the event specification line.

Not applicable

How about triggering your applet another way?:

event manager applet high_cpu_recovery
  event ioswdsysmon sub1 cpu-proc taskname “DocMSP” op gt val 50 is-percent true period 60
  action 1.0 syslog msg "----HIGH CPU DETECTED, BOUNCING T1s----"
  ... and so on ...

This difference from your script is triggering on IOS system monitor counters rather than a syslog message. The theory being that using the IOS system monitor counters will allow you to watch the CPU utilization for the DocMSP process and run your script before the CPU reaches 100% so there's some CPU left to run it. I don't know if 50% ("val 50" above) is the right number for the threshold, given your long experience with this issue you know what constitutes values that aren't sane for DocMSP CPU utilization.

My syntax above may not be 100% correct, if not it's documented here:

http://www.cisco.com/c/en/us/td/docs/ios-xml/ios/eem/command/eem-cr-book/eem-cr-e1.html

I'm the SE for TDS by the way. Gio just brought this issue to my attention yesterday. Thank you for your hard work on this to date.

Joe Clarke
Cisco Employee
Cisco Employee

This will not help if, as I propose, the maxrun time is being hit.  When the CPU is high, and especially if AAA command authorization is being used, each command can take a long time to execute thus pushing the policy toward its default 20 second maxrun time.  I would look at maxrun first, especially if the "show logg" shows the syslog message is being generated.

ffacilities
Level 1
Level 1

Thanks dkok  and jclarke  for your replies.  It'll take a while before we can tell if either or both changes do the trick; fingers crossed.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco