With Srikanth Babu and Vishnu Asok
Welcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about common symptoms, causes of memory and high CPU related issues and how to troubleshoot the same on Cisco Integrated Services Routers (ISR-G2) , 7200 Network Processing Engine(NPE)-G1, NPE-G2 with Cisco Experts Srikanth Babu and Vishnu Asok.This event is a continuation of the Facebook Forum.
Srikanth Babu is a customer support engineer in the Technical Assistance Center in India, where he has nearly 3 years of experience. His areas of expertise include core architecture of Cisco routers, troubleshooting high CPU, memory issues on routers, packet drops/latency, interface errors and more. He holds a bachelor's degree in electronics and communication as well as CCNA and CCNP certifications.
Vishnu Asok is a customer support engineer in the Technical Assistance Center in India. He has over 5 years of experience in LAN switching technologies, including the Cisco Catalyst 6500, 4500, and 7600 platforms. His areas of expertise include architecture of routers, Cisco IOS, and troubleshooting issues related to router platforms. He holds a bachelor of technology degree in electronics and communication from Cochin University of Science & Technology as well as CCNA, CCNP, and CCIP certifications.
Remember to use the rating system to let Srikanth and Vishnu know if you have received an adequate response.
Srikanth and Vishnu might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the Network Infrastructure sub-community , discussion forum shortly after the event. This event lasts through Nov 2, 2012. Visit this forum often to view responses to your questions and the questions of other community members.
Our monitoring system(smarts) is frequently reporting high CPU for one of our branch routers(Cisco 2821). As soon as I receive this alert, I telnet to the box and find the CPU to be somewhere around 10% while the CPU history shows a spike upto 96% for few seconds.
I am clueless as to what might be happening. Am I dealing with a hardware issue here?
p.s This issue has been going for for few months now.
Thanks for posting this query.
As the CPU spike does not last for more than few seconds, I assume you did not have a chance yet to identify if the CPU was spiking due to interrupts or a specific process.
Let's take the following approach-
a)Configure the below EEM script on the router. This script will be automatically triggered whenever the CPU spikes above 75%.
This will execute a set of commands and save the output to a file system(we will use flash:)
event manager applet capture_cpu_spike
event snmp oid 18.104.22.168.22.214.171.124.1.56 get-type next entry-op ge entry-val 75 exit-time 600 poll-interval 1
action 1.0 syslog msg "CPU Utilization is high"
action 1.1 cli command "enable"
action 1.2 cli command "term exec prompt timestamp"
action 1.3 cli command "show proc cpu sorted | redirect flash:cpu_info.txt"
action 1.4 cli command "show proc cpu history | redirect flash:cpu_history.txt"
action 1.6 cli command "show debug | redirect flash:debug.txt"
action 1.7 cli command "show users | redirect flash:users.txt"
action 1.8 cli command "show interface | redirect flash:interface_info.txt"
action 1.9 cli command "term no exec prompt timestamp"
action 2.1 syslog msg "CPU Utilization is Low"
If you have configured AAA authorization, please add the below configuration line for EEM to work-
"event manager session cli username
b)The EEM scipt will be triggered once the CPU spikes to above 75%. Collect the outputs and check if there is a specific process utilizing the
c)The 'show users' output will tell us if there are any users/ monitoring servers/backup servers trying to access the device and execute
any operations. The show debug command will also help us identify if any debugs are getting turned ON when the CPU spikes.
The below link explains some basic troubleshooting on high CPU-
You may also attach the outputs of the above EEM script to your reply so that we can have a look.
Feel free to reach out for any assistance.
Cisco Systems Inc
Thanks for your valuable inputs.
I will apply the configuration after the business hours and will wait for the outputs to be captured.
Can you please explain what the % values shown in sh proc cpu command output mean , as shown below.
I mean to say, what is 7% and what is 0% in below output, what do they specify ?
CPU utilization for five seconds: 7%/0%; one minute: 9%; five minutes: 10%
when there is a high cpu util, which values to be considered ? and if any specific process which is utilising high cpu, how do we troubleshoot that particular process and bring down the cpu usage.
Thanks & Regards
Thanks for your questions.
Let’s take the below example to understand CPU behavior better:
CPU utilization for five seconds: X%/Y%; one minute: ABC%; five minutes: PQR%
Here X is the total CPU in % and Y% is the CPU due to interrupt.
Hence the CPU due to process would be X%-Y%= Z%
Every time a packet hits an interface , the interface generates an interrupt to the CPU to take care of the packet.
Higher the number of packets/traffic, higher would be the number of interrupts. Hence higher would be the CPU utilized.
Every router/platform has a limitation on the bandwidth/traffic it can handle. This value goes down significantly with processor intensive configuration eg: NAT, QOS, WCCP, Netflow, IP Accounting , CRYPTO etc.
So if we see Y% to be high , majority of the time it would be due tolegitimate traffic hitting the device and would be expected.
Coming to your question on how to troubleshoot process related CPU utilization.
We already know that : CPU due to process would be X%-Y%= Z%
When Z% is high it would be due to processes running on a device. Examples of processes would be EIGRP , H323 , BGP,CRYPTO etc.
Every process has PID associated with it
router#show processes cpu
CPU utilization for five seconds: X%/Y%; one minute: ABC%; five minutes: PQR%
PID Runtime(uS) Invoked uSecs 5Sec 1Min 5Min TTY Process
123 384 32789 11 70.00% 70.00% 65.00% 0 EIGRP
Let’s assume EIGRP is the process causing CPU to spike and EIGRP PID is 123.
There could be few reasons why EIGRP could be causing high CPU here.
Hence in process related high CPU we should troubleshoot the process to bring down the CPU.
Hope this answers your questions. Feel free to contact respond if you have any further questions
Good to see this topic in the expert corner.
Can you suggest any alternate tool to CPU profiling ? or an easier way to decode the profiled output to avoid TAC assistance? Can we use the CPU profiling during high loads ?
Can you suggest any methods/tools (preferably in newer IOS) to troubleshoot high CPU due to interrupts ? are these tools available for all ISRs, ISR G2 and 7200 VXR/NPEG2 routers ?
How can Embedded Packet Capture utility help in high cpu cases ?
Is there any tool similar to SPAN/NETDR (on switches) available for routers ?
I have been using following links which are very help for quite a time, just checking any updates from newer IOS perspective.
Thanks for bringing this up.
The Cisco IOS profile command samples the Instruction Pointer at a fixed interval of 250 times per second and is synchronous to any timer-based events.The profiler needs to record what functions were invoked and how many CPU cycles it took to execute a function.
CPU profiling is a low-overhead way of determining where the CPU spends most of its time and can be safely executed even when the CPU utilization is high. However make sure that you know the following before you perform a CPU profiling-
a)The memory resources will be stressed when you perform a CPU profiling. Disable CPU profiling immediately if you notice memory allocation failures. TAC supervision is strictly recommended.
b)Once you are done with CPU profiling do not forget to unprofile the task. The processor pool will not reclaim the memory used by profiling unless you do an unprofile using the command 'unprofile all'.
As you have rightly pointed out, the CPU profiling procedure is explained in the link below-
There is no alternative tool to CPU profiling. Also you need TAC assistance to decode the profile output as this requires the relevant symbols file that are generated during the IOS compile time. Only the TAC has access to these files which are stored in a highly secure environment.
I am not aware of any new procedure apart from what is already explained in the below link to troubleshoot high CPU due to interrupts-
The last enhancement I remember was the addition of the command 'show ip cef switching statistics' and 'show ip cef switching statistics feature' that replaced the command 'show cef not cef-switched'. These new commands shows the packets punted to the CPU by a particular feature configured on the device and is really helpful for troubleshooting CPU due to IP input process (packets not processed by CEF).
The troubleshooting procedure remains the same for all the ISR platforms and the NPE-G1/G2's which are essentially a single CPU platform. The procedure varies slightly with the distributed architecture platforms like the ASR'S and the ISRG3's which are expected to be FCS'd next month.
Regarding the SPAN/NETDR functionality, unfortunately we do not have any similar tools available on the routers at this moment. However we can perform a span on the router if we have any of the ether switch modules installed.
Coming to the EPC part, it can be really helpful to troubleshoot high CPU due to process switching(IP Input). One feature of EPC seldom used is the option to capture processed switched traffic-
R1#monitor capture point ip ?
cef IPv4 CEF
process-switched Process switched packets
R1#monitor capture point ip process-switched ATE ?
both Inbound and outbound and packets
from-us Packets originating locally
in Inbound packets
out Outbound packets
We can analyze the capture using wireshark to see if there are any process switched flows from any specific source which is very useful to isolate the issue.
Feel free to contact us for any queries.
Cisco Systems Inc