Solved: Ask the Expert: High CPU on IOS Questions with Cisco Expert Vinit Jain

Lisa Latour · ‎05-05-2015

This is an opportunity to learn and ask questions about high CPU condition that you might be facing your environment and troubleshooting the same with the tools and techniques available within the platform with Cisco expert Vinit Jain.

Ask questions from Monday, May 11th, 2015 to Friday, May 22, 2015

High CPU condition is a very common problem seen in production environments which can cause a huge impact on the services if not taken care on time. High CPU can be classified in primarily in two categories – 1) High CPU due to process and 2) High CPU due to interrupt (traffic). Cisco expert Vinit Jain will cover and answer all of your questions about troubleshooting High CPU on Cisco IOS.

Vinit Jain, 3X CCIE #22854 is a Technical Lead in HTTS (High Touch Technical Support) team supporting customers in areas of routing, MPLS, TE, IPv6, multicast and a wide variety of platform issues like High CPU, Memory leak, etc IOS, IOS XE, IOS XR and NxOS code base. Has been delivering trainings within Cisco on various technology as well as platform troubleshooting topics. He has also written workbook on IOS XR fundamentals on Cisco Support Community. Vinit has CCIE in R&S, SP and Sec and holds multiple certifications on programming and databases.

Vinit Jain will also be speaking at Cisco Live in June 2015 on Troubleshooting BGP (BRKRST-3320).
Click here for More Information

Find other https://supportforums.cisco.com/expert-corner/events.

**Ratings Encourage Participation! **
Please be sure to rate the Answers to Questions

Vinit Jain · ‎05-08-2015

Hello Manish

yes, there are scenario's in which an IGP flap can cause CPU to spike up. Now the question is how do we approach this problem. If we try to troubleshoot high CPU, then this will lead us to look at IGP flaps.

Suppose, the BFD is flapping which is causing OSPF to flap, then we can use the below script to troubleshoot this problem:

event manager applet OSPF_Monitor
event syslog pattern "Neighbor Down: BFD node down"
action 1.01 syslog priority critical msg "**** BFD Failure Detected - Statistics Logged ****"
action 1.02 cli command "enable"
action 1.03 cli command "show clock | append bootdisk:cpu_stats"
action 1.04 cli command "show proc cpu sort | append bootdisk:cpu_stats"
action 1.05 cli command "debug netdr cap  rx"
action 1.06 cli command "show netdr cap | append bootdisk:cpu_stats"
action 1.07 cli command "undebug all"
action 1.08 cli command "end"

The above capture is for performing a netdr capture on the event of BFD flap to see what packets are hitting the CPU which can then be decoded to further understand what is happening on the router. We can capture commands related to BFD or OSPF in the above EEM.

If we dont know which process or protocol is causing high CPU and when its causing it, we can have another EEM script configured on the router which can be triggered when the CPU spikes up:

event manager applet HIGHCPU
event snmp oid "1.3.6.1.4.1.9.9.109.1.1.1.1.3.1" get-type exact entry-op gt entry-val "90"
exit-op lt exit-val "70" poll-interval 5 maxrun 200
action 1.0 syslog msg "START of TAC-EEM: High CPU"
action 1.1 cli command "enable"
action 1.3 cli command "debug netdr clear-capture"
action 1.4 cli command "debug netdr capture rx“
action 2.0 cli command "sh clock | append disk0:proc_CPU"
action 2.1 cli command "show process cpu sorted | append disk0:proc_CPU“
action 2.2 cli command "show proc cpu history | append disk0:proc_CPU"
action 2.3 cli command "show netdr capture | append disk0:proc_CPU"
action 3.1 cli command "show log | append disk0:proc_CPU"
action 4.0 syslog msg "END of TAC-EEM: High CPU"

In the above EEM script, we are triggering the EEM when the high CPU is noticed. We can also set the min and max range of CPU on which the trigger can occur.

The more imp question is why the IGP is flapping. It could be due to some drops, of MTU issues or some rate-limiter dropping some legitimate packets etc..

Hope this helps.

Vinit

PS: Please do rate the reply if you find them useful

Thanks
--Vinit

View solution in original post

Reza Sharifi · ‎05-05-2015

I know it is not 5/11 yet, but I will ask the question and wait for the response until 5/11.

Hi Vinit,

I have purchased several Cisco 2960-X switches. Out of the box, without ANY configuration the CPU runs at about 40% to 42% at all the time. One process in particular (Hulc LED Process) is using 22% of the CPU. We have also opened a ticket with TAC and they told us that this behavior is normal. Can you explain what is cause the high CPU and if Cisco has anything on the road map to remedy this?

Thanks,

Reza

Vinit Jain · ‎05-05-2015

Hello Reza

Could you please let me know what is the IOS version that you are running? i have noticed this as a common problem on this platform. Based on my research,

The "Hulc LED" process does following tasks:

- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD)
detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status

So even if there is no production traffic, you might be seeing this process consuming CPU on the switch. This is documented as an expected behavior under the software defect# CSCtg86211 (This problem is not a software defect but an expected behavior).

Hope this information helps.

Regards,

Vinit

Thanks
--Vinit

Reza Sharifi · ‎05-05-2015

Hello Vinit,

Here is the version of the IOS we are running:

Version 15.0(2a)EX5

Thanks,

Reza

Vinit Jain · ‎05-05-2015

Hello Reza

I dont think this behavior is changed and neither do I see any clear indication on when this is going to get changed/fixed.

If your query has been answered, please do rate the post and the mark the answer as complete.

If you have any further questions, please feel free to ask them as well.

Regards,

Vinit

Thanks
--Vinit

Reza Sharifi · ‎05-05-2015

I just don't understand why CPU on these switches run at 40% without any config on them.

Thanks for the feedback.

Vinit Jain · ‎05-05-2015

Hello Reza

On a quick note, are there any cables connected on the switch. If you do show ip int brief, what is the status that you see. Are the ports in up state?

Thanks
--Vinit

Reza Sharifi · ‎05-08-2015

Hi Vinit,

Yes, the ports that are connected to end devices and the uplinks are in up state.

Reza

Vinit Jain · ‎05-08-2015

Hello Reza

If the device is not in production, can you try to unplug the ports or may be shut them down using shutdown command and see if that brings any change in the CPU utilization as based on what this process does, it should bring it down even if its few CPU cycles.

Regards,

Vinit

Thanks
--Vinit

Karan () · ‎05-08-2015

Hello,

we have a 7200 series router which shows high CPU. The problem started happening after we enabled netflow on the router. There has been no change in traffic though. Is this a known issue. How can we fix it?

RA01#show proc cpu sorted | ex  0.00
CPU utilization for five seconds: 97%/57%; one minute: 97%; five minutes: 96%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
  95   122265420   537588673        227 35.11% 34.27% 34.32%   0 IP Input
 179     3734124    23873626        156  2.23%  2.44%  2.17%   0 LFDp Input Proc
   9   103157352   312479671        330  1.03%  1.14%  1.08%   0 ARP Input
 312      732648  2234277784          0  0.63%  0.57%  0.56%   0 IP SLAs Responde
 170    23292176    13980278       1666  0.23%  0.22%  0.23%   0 CEF: IPv4 proces
 303     9053864   329693755         27  0.15%  0.18%  0.17%   0 PPP Events
 168      166280  2234199103          0  0.15%  0.12%  0.13%   0 HQF Output Shape
 329    39411164   287153755        137  0.15%  0.32%  0.32%   0 BGP Router

Vinit Jain · ‎05-08-2015

Hello Karan,

i would like to know what IOS version are you running? If you remove the netflow configuration, does it normalize the CPU?

Vinit

Thanks
--Vinit

Karan () · ‎05-08-2015

We are running 15.1(1)S release.

For the second question, yes when we remove the configuration, the CPU normalizes. We have similar configuration on 7600 router but we dont see high CPU there. Is this something common on 7200?

Vinit Jain · ‎05-08-2015

I believe this is known issue. CSCtr92077. The problem here seems to be that the netflow might be causing the packets to be software switched instead of fast switching.

Could you confirm if you are using Advanced IP Service image or Advanced enterprise image. If you are using the IP Services image, i would request you to try using adventerprise image. That should probably fix the issue.

Hope this helps.

Vinit

Thanks
--Vinit

Karan () · ‎05-08-2015

Yes, we are using Adv IP services image. Will check internally to see if we can make a change.

Will keep you posted

Thanks for quick response.

Karan () · ‎05-12-2015

The issue has been resolved. We upgraded to Adv IP Services image on one of our routers which fixed the issue.

Thanks for your help.