11-12-2010 12:23 PM - edited 03-01-2019 04:34 PM
In this document I will include 4 ways to detect high CPU spikes on Cisco routers and switches and do something about it using EEM (Embedded Event Manager). Later in the document I will add some command that can helpfull in troubelshooting high CPU. This document is just a draft version.I will try to add more and edit weekly bassis to improve quality.
High CPU detection using CLI:
This is traditional method used to find the CPU utilization. Not usefull when CPU spikes are seen on management station in middle of night.
Router#sh processes cpu sorted ?
1min Sort based on 1 minute utilization
5min Sort based on 5 minutes utilization
5sec Sort based on 5 seconds utilization
| Output modifiers
<cr>
Router#sh processes cpu sorted 5min
CPU utilization for five seconds: 2%/0%; one minute: 2%; five minutes: 1%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
179 168 28275159 0 0.79% 0.76% 0.77% 0 HQF Shaper Backg
307 160 3529216 0 0.23% 0.18% 0.16% 0 PPP manager
5 141912 13430 10566 0.00% 0.10% 0.11% 0 Check heaps
308 248 3529216 0 0.15% 0.09% 0.08% 0 PPP Events
122 26424 5141 5139 0.39% 0.24% 0.06% 0 Exec
2 36 22595 1 0.07% 0.04% 0.05% 0 Load Meter
180 44 1129318 0 0.00% 0.03% 0.02% 0 RBSCP Background
312 16 1132266 0 0.00% 0.03% 0.02% 0 FR Broadcast Out
42 64 113310 0 0.07% 0.02% 0.00% 0 Per-Second Jobs
65 28 451871 0 0.00% 0.01% 0.00% 0 Netclock Backgro
273 1508 37676 40 0.00% 0.01% 0.00% 0 OSPF-1 Hello
Total CPU Utilization is comprised of process and interrupt percentages. These values are found on the first line of
the output:
CPU utilization for five seconds: x%/y%; one minute: a%; five minutes: b%
Total CPU Utilization: x%
Process Utilization: (x - y)%
Interrupt Utilization: y%
Process Utilization is the difference between the Total and Interrupt (x and y).
The one and five minute utilizations are exponentially decayed averages (rather than an arithmetic average),
therefore recent values have more influence on the calculated average.
High CPU detection using Embedded Resource Manager(ERM):
/**************************************************************************************/
resource policy
policy HIGHCPU global
system
cpu interrupt
critical rising 90 interval 10
major rising 70 interval 10
minor rising 40 interval 10
!
cpu process
critical rising 80 interval 10
major rising 60 interval 10
minor rising 40 interval 10
!
event manager applet HIGHCPU-ERM
event resource policy "HIGHCPU"
action 1.0 cli command "enable"
action 2.0 cli command "show proc cpu sorted 5min"
action 3.0 mail server "198.2.5.10" to "tac@cisco.com" from "NOC@mycompany.com" subject "CPU Alert 5min" body "$_cli_result"
/************************************************************************************/
You can set rising and falling values for critical, major, and minor levels of thresholds. When the resource utilization exceeds the rising threshold level, an Up notification is sent. When the resource utilization falls below the falling threshold level, a Down notification is sent. This is more granual because CPU by interrupt and cpu by process can be monitored.
EEM applet will send email to tac@cisco.com with necessary result of the show proc cpu.
Action can only be triggered via Embedded Event Manager 2.2
Available in Following IOS or higher
12.4(2)T
12.2(31)SB3
12.2(33)SRB
High CPU detection using RMON:
/***********************************************************************************/
rmon event 1 log description "CPU has crossed rising threshold"
rmon alarm 12 cpmCPUTotalTable.1.8.1 60 absolute rising-threshold 80 1 falling-threshold 40
!!! polling interval 60 seconds and 80 percent CPU utilization
!!! %RMON-5-RISINGTRAP: Rising trap is generated because the value of cpmCPUTotalTable.1.8.1 has exceeded the rising-threshold value 60
event manager applet HIGHCPU-RMON
event syslog pattern ".*%RMON-5-RISINGTRAP.*"
action 1.0 echo " Do what ever you want about it"
/**********************************************************************************/
This is useful only when switch has one RMON event configured since it uses syslog event detector to match the RMON syslog pattern. For platform specific RMON support check following URL.
http://www.cisco.com/en/US/docs/ios/netmgmt/configuration/guide/netmgmt_rmon_supp_roadmap.html
High CPU detection using SNMP:
!
snmp-server enable traps cpu threshold
snmp-server host 192.168.2.1 traps version 2c public cpu
process cpu threshold type total rising 80 interval 60 falling 40 interval 60
process cpu statistics limit entry-percentage 70 size 300
!
Above configuration detect the high CPU usage similar way we did it with RMON. It sends a trap to management station. While configuring thresold type you can also make it granular for process and interrupt level. For more information about the command refer to 12.4T configuration guide.
You can also use SNMP event type to configure EEM applet. Following applet stores relevant show commands in flash:high_cpu.txt. this applet is written on ISR routers in case of 65XX,45XX and 76XX use bootflash:high_cpu.txt when redirecting the output to a file. Removes itself from config after completion.
It requires SNMP be enabled and EEM v2.1. Event statement has to be use with care because sometimes sudden spikes in CPU usage might cause the actions not to run.Choose poll interval carefully, more command you add into the actions it will take long time to run so if that duration exceeds poll interval event will be detected once again and it will overwrite the high_cpu.txt file.
event manager scheduler script thread class default number 1
event manager applet High_CPU_SNMP
! Cisco process MIB Object name: cpmCPUTotal1min
! event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.4 get-type next entry-op gt entry-val 80 poll-interval 15
! Cisco process MIB Object name: cpmCPUTotal5min
event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.8 get-type next entry-op gt entry-val 80 poll-interval 15
action 0.0 syslog msg "High CPU DETECTED. Please wait - logging Information to flash:high_cpu.txt"
action 0.1 cli command "enable"
action 0.2 cli command "term exec prompt timestamp"
action 0.3 cli command "term len 0"
! redirects the command to flash:/bootflash:/disk0: etc
action 1.2 cli command "show process cpu sorted 5min | redirect flash:high_cpu.txt"
! action 1.2 cli command "show process cpu sorted 1min | redirect flash:high_cpu.txt"
action 1.3 cli command "show buffer input-interface GigabitEthernet0/1 dump | redirect flash:high_cpu.txt"
action 1.4 cli command "show cef not | redirect flash:high_cpu.txt"
action 1.5 cli command "show buffer | redirect flash:high_cpu.txt"
action 1.6 cli command "show ip traffic | redirect flash:high_cpu.txt"
! Here you can add any command you want to capture
! in following section it is necessary to remove the EEM from configuration to avoid repeated execution of
! action in case you have many spikes in short period
action 5.1 syslog msg "Finished logging information to flash:high_cpu.txt..."
action 5.1 syslog msg "Self-removing applet from configuration..."
action 9.1 cli command "configure terminal"
action 9.2 cli command "no event manager applet High_CPU_SNMP"
action 9.3 cli command "end"
! End of script
Packet capture using built-in tools:
On 6500 platforms with Sup 720 PFC and MSFC with IOS code 12.2(33) SXH OR SXI you can do the Net driver captures, it captures the packets hitting CPU for processing instead of hardware switching.
Switch#debug netdr capture rx
Switch#show netdr captured-packets
On 4500 Platforms you can capture CPU bounded packets using following command
Switch#debug platform packet all receive buffer
platform packet debugging is on
Switch#show platform cpu packet buffered
CPU profiling for ISR and 7200 routers:
CPU profiling is a low-overhead way of determining where the CPU spends its time. The system works by sampling the processor location every four milliseconds. The count for that location in memory is incremented. The root cause of this CPU utilization will be determined by CPU Profiling.
Router#profile task ?
<0-4294967295> list of specific process ids (pids) <---look for Process ID by issueing "show proc cpu sorted" command output ( first column)
all profile all processes
interrupt includes interrupt related data in profile
<cr>
Router# show profile terse
CPU profiling can also be usefull to find any bug related to a process, for more info about CPU profiling for interrupts check reference # 2.
Additional references :
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: