cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
10011
Views
15
Helpful
8
Replies

Ask the Expert:Troubleshooting Memory and High CPU Related Issues on CISCO Integrated Services Routers (ISR-G2), 7200 (NPE-G1, NPE-G2)

ciscomoderator
Community Manager
Community Manager

With Srikanth Babu and Vishnu Asok

Read the bioRead the bio

Welcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about common symptoms, causes of memory and high CPU related issues and how to troubleshoot the same on Cisco Integrated Services Routers (ISR-G2) , 7200 Network Processing Engine(NPE)-G1, NPE-G2 with Cisco Experts Srikanth Babu and Vishnu Asok.This event is a continuation of the Facebook Forum.

Srikanth Babu is a customer support engineer in the Technical Assistance Center in India, where he has nearly 3 years of experience. His areas of expertise include core architecture of Cisco routers, troubleshooting high CPU, memory issues on routers, packet drops/latency, interface errors and more. He holds a bachelor's degree in electronics and communication as well as CCNA and CCNP certifications.

Vishnu Asok is a customer support engineer in the Technical Assistance Center in India. He has over 5 years of experience in LAN switching technologies, including the Cisco Catalyst 6500, 4500, and 7600 platforms. His areas of expertise include architecture of routers, Cisco IOS, and troubleshooting issues related to router platforms. He holds a bachelor of technology degree in electronics and communication from Cochin University of Science & Technology as well as CCNA, CCNP, and CCIP certifications.

Remember to use the rating system to let Srikanth and Vishnu know if you have received an adequate response. 

Srikanth and Vishnu might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the Network Infrastructure sub-community , discussion forum shortly after the event.  This event lasts through Nov 2, 2012. Visit this forum often to view responses to your questions and the questions of other community members.

8 Replies 8

xavier.ronald
Level 1
Level 1

Dear Cisco

Our monitoring system(smarts) is frequently reporting high CPU for one of our branch routers(Cisco 2821). As soon as I receive this alert, I telnet to the box and find the CPU to be somewhere around 10% while the CPU history shows a spike upto 96% for few seconds.

I am clueless as to what might be happening. Am I dealing with a hardware issue here?

p.s This issue has been going for for few months now.

Thanks

Xavier

Hi Xavier

Thanks for posting this query.

As the CPU spike does not last for more than few seconds, I assume you did not have a chance yet to identify if the CPU was spiking due to interrupts or a specific process.

Let's take the following approach-

a)Configure the below EEM script on the router. This script will be automatically triggered whenever the CPU spikes above 75%.

This will execute a set of commands and save the output to a file system(we will use flash:)

event manager applet capture_cpu_spike

event snmp oid 1.3.6.1.4.1.9.2.1.56 get-type next entry-op ge entry-val 75 exit-time 600 poll-interval 1

action 1.0 syslog msg "CPU Utilization is high"   

action 1.1 cli command "enable"

action 1.2 cli command "term exec prompt timestamp"

action 1.3 cli command "show proc cpu sorted | redirect flash:cpu_info.txt"

action 1.4 cli command "show proc cpu history | redirect flash:cpu_history.txt"

action 1.6 cli command "show debug | redirect flash:debug.txt"

action 1.7 cli command "show users | redirect flash:users.txt"

action 1.8 cli command "show interface | redirect flash:interface_info.txt"

action 1.9 cli command "term no exec prompt timestamp"

action 2.1 syslog msg "CPU Utilization is Low"

If you have configured AAA authorization, please add the below configuration line for EEM to work-

"event manager session cli username "

b)The EEM scipt will be triggered once the CPU spikes to above 75%. Collect the outputs and check if there is a specific process utilizing the

CPU cycles.

c)The 'show users' output will tell us if there are any users/ monitoring servers/backup servers trying to access the device and execute

any operations. The show debug command will also help us identify if any debugs are getting turned ON when the CPU spikes.

The below link explains some basic troubleshooting on high CPU-

http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186a00800a70f2.shtml#high_cpu

You may also attach the outputs of the above EEM script to your reply so that we can have a look.

Feel free to reach out for any assistance.

Regards,

Vishnu Asok

Cisco Systems Inc

Hi Vishnu

Thanks for your valuable inputs.

I will apply the configuration after the business hours and will wait for the outputs to be captured.

Thanks,

Xavier

Dear Team,

Can you please explain what the % values shown in sh proc cpu command output mean , as shown below.

I mean to say, what is 7% and what is 0% in below output, what do they specify ?

CPU utilization for five seconds: 7%/0%; one minute: 9%; five minutes: 10%

when there is a high cpu util, which values to be considered ? and if any specific process which is utilising high cpu, how do we troubleshoot that particular process and bring down the cpu usage.

Thanks & Regards

Raghavendra

Hi Raghavendra,

Thanks for your questions.

Let’s take the below example to understand CPU behavior better:

CPU utilization for five seconds: X%/Y%; one minute: ABC%; five minutes: PQR%

Here X is the total CPU in % and Y% is the CPU due to interrupt.

Hence the CPU due to process would be X%-Y%= Z%

Every time a packet hits an interface , the interface generates an interrupt to the CPU to take care of the packet.

Higher the number of packets/traffic, higher would be the number of interrupts. Hence higher would be the CPU utilized.

Every router/platform has a limitation on the bandwidth/traffic it can handle. This value goes down significantly with processor intensive configuration eg: NAT, QOS, WCCP, Netflow, IP Accounting , CRYPTO etc.

So if we see Y% to be high , majority of the time it would be due tolegitimate  traffic hitting the device  and would be expected.

Coming to your question on how to troubleshoot process related CPU utilization.
We already know that : CPU due to process would be X%-Y%= Z%

When Z% is high it would be due to processes running on a device. Examples of processes would be EIGRP , H323 , BGP,CRYPTO etc.

Every process has PID associated with it

eg:

router#show processes cpu

CPU utilization for five seconds: X%/Y%; one minute: ABC%; five minutes: PQR%
      PID     Runtime(uS)   Invoked  uSecs   5Sec   1Min   5Min         TTY Process
        123       384            32789     11      70.00%  70.00%  65.00%   0   EIGRP

Let’s assume EIGRP is the process causing CPU to spike and EIGRP PID is 123.

There could be few reasons why EIGRP could be causing high CPU here.

  • Could      be an issue with the code running on the device. We will have to collect :      "show stack 123" and send it to TAC to confirm if we have a bug.
  • This      can be due to a EIGRP mis-configuration .
  • Also this can be due to EIGRP re-converging and causing CPU to spike. In this case      we will have to find the RCA of what caused EIGRP to re-converge.

Hence in process related high CPU we should troubleshoot the process to bring down the CPU. 

Hope this answers your questions.  Feel free to contact respond if you have any further questions

Regards,

Srikanth

Cisco Systems

Hi Dear Expert.I think you can help me. I have 2x6500s series catalyst core switch. i configurated vss. all them are working normal. but i have one problem. some of my servers link is down sometimes. I configurated server links as etherchannel.at etherchannel  not both of links down only one link down.

this modules i used to connect servers to  core switch. modules 3 and 7 slot.

  3   16  CEF720 16 port 10GE                    WS-X6716-10GE      SAL1414EL2Q

  7   16  CEF720 16 port 10GE                    WS-X6716-10GE      SAL1414ER93

the gbic:

NAME: "Chassis 1 Transceiver Te1/3/1", DESCR: "Chassis 1 X2 Transceiver 10Gbase-SR Te1/3/1"

PID: X2-10GB-SR     VID: V04 , SN: AGA1406XLZR

i attacted the log files.

the core switch iso is 12.2(33) SXI6. is a soft bug?

why this issue is occur? the configuration is normal. this problem happens sometimes not always.

is this a gbic or module problem?

Akhtar Samo
Level 1
Level 1

Hi Vishnu,

Good to see this topic in the expert corner.

Can you suggest any alternate tool to CPU profiling ? or an easier way to decode the profiled output to avoid TAC assistance? Can we use the CPU profiling during high loads ?

Can you suggest any methods/tools (preferably in newer IOS) to troubleshoot high CPU due to interrupts ? are these tools available for all ISRs, ISR G2 and 7200 VXR/NPEG2 routers ?

How can Embedded Packet Capture utility help in high cpu cases ?

Is there any tool similar to SPAN/NETDR (on switches) available for routers ?

I have been using following links which are very help for quite a time, just checking any updates from newer IOS perspective.

http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a00801c2af0.shtml

http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186a00800a70f2.shtml

Regards,

Akhtar

Hi Akhtar

Thanks for bringing this up.

The Cisco IOS profile command samples the Instruction Pointer at a fixed interval of 250 times per second and is synchronous to any timer-based events.The profiler needs to record what functions were invoked and how many CPU cycles it took to execute a function.

CPU profiling is a low-overhead way of determining where the CPU spends most of its time and can be safely executed even when the CPU utilization is high. However make sure that you know the following before you perform a CPU profiling-

a)The memory resources will be stressed when you perform a CPU profiling. Disable CPU profiling immediately if you notice memory allocation failures. TAC supervision is strictly recommended.

b)Once you are done with CPU profiling do not forget to unprofile the task. The processor pool will not reclaim the memory used by profiling unless you do an unprofile using the command 'unprofile all'.

As you have rightly pointed out, the CPU profiling procedure is explained in the link below-

http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a00801c2af0.shtml

There is no alternative tool to CPU profiling. Also you need TAC assistance to decode the profile output as this requires the relevant symbols file that are generated during the IOS compile time. Only the TAC has access to these files which are stored in a highly secure environment.

I am not aware of any new procedure apart from what is already explained in the below link to troubleshoot high CPU due to interrupts-

http://www.cisco.com/en/US/products/hw/routers/ps133/products_tech_note09186a00800a70f2.shtml

The last enhancement I remember was the addition of the command 'show ip cef switching statistics' and 'show ip cef switching statistics feature' that replaced the command 'show cef not cef-switched'. These new commands shows the packets punted to the CPU by a particular feature configured on the device and is really helpful for troubleshooting CPU due to IP input process (packets not processed by CEF).

The troubleshooting procedure remains the same for all the ISR platforms and the NPE-G1/G2's which are essentially a single CPU platform. The procedure varies slightly  with the distributed architecture platforms like the ASR'S  and the ISRG3's which are expected to be FCS'd next month.

Regarding the SPAN/NETDR functionality, unfortunately we do not have any similar tools available on the routers at this moment. However we can perform a span on the router if we have any of the ether switch modules installed.

Coming to the EPC part, it can be really helpful to troubleshoot high CPU due to process switching(IP Input). One feature of EPC seldom used is the option to capture processed switched traffic-

R1#monitor capture point ip ?

  cef               IPv4 CEF

  process-switched  Process switched packets

R1#monitor capture point ip process-switched ATE ?      

  both     Inbound and outbound and packets

  from-us  Packets originating locally

  in       Inbound packets

  out      Outbound packets

We can analyze the capture using wireshark to see if there are any process switched flows from any specific source which is very useful to isolate the issue.

Feel free to contact us for any queries.

Best Regards,

Vishnu Asok

Cisco Systems Inc

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card