High CPU throughout entire network

chrisseidt · ‎11-12-2012

We've been seeing an issue for some time now where our switches show spikes close to 98-99% throughout the day on their CPU's. This behavior shows in the distribution switches and the access switches, at all hours of the day, all days of the week, but at varying times. When we start to investigate with a sho proc cpu history, we get the outputs below (typically). As you can see, the one minute views and 5 minute views NEVER show much of anything, but the last 72 hour view shows what I would consider to be a huge problem (although the averages are very low overall). If I sit in the switch and continously up-arrow to refresh this, it never really changes. Same for the sho proc cpu sorted (also below).

Models affected include 6509-E's with Sup720-10G, 4510's, 3560E, 3750X, 2960S, 2960G, 3560G, 3750G, 4948 10G. Pretty much every switch in the environment. We do run a fair amount of VLANs within the network (around 150). I'm not sure if we're seeing Spanning Tree issues or some other problem (we do prune our vlans). We get sporadic complaints from users about random disconnects from the switch at different times of day, and reports of sluggishness on the network (no log events or unusual behavior in our Solarwinds monitoring to indicate an issue).

Any ideas on where else to look would be appreciated.

1111111111111111111111111111111111111111111111111111111111

3333111110000077777000002222211111111112222211111000002222

100

90

80

70

60

50

40

30

20 *****

10 **********************************************************

0....5....1....1....2....2....3....3....4....4....5....5....

0 5 0 5 0 5 0 5 0 5

CPU% per second (last 60 seconds)

1111111111111111111111111111111111111114211111111111111111

7775837587377777487658688478757687778985746766975877667776

100

90

80

70

60

50 *

40 *

30 #*

20 ***** **** ***** ******** *************## ****************

10 ##########################################################

0....5....1....1....2....2....3....3....4....4....5....5....

0 5 0 5 0 5 0 5 0 5

CPU% per minute (last 60 minutes)

* = maximum CPU% # = average CPU%

1

4854764534474543443176443455447556754758688570656456677566544444646598

5455013083515917068855158740846499291521640980848927242094186242814472

100 * *

90 * *

80 * * * * * ** ** **

70 * * * ** * ** * **** *** * * ** * * **

60 ** ** * * ** * **** * ******** * **** ** * * **

50 ****** * **** * ** * **** ****** ************************ * ****

40 ******************* **************************************************

30 ******************* **************************************************

20 **********************************************************************

10 ######################################################################

0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.

0 5 0 5 0 5 0 5 0 5 0 5 0

CPU% per hour (last 72 hours)

* = maximum CPU% # = average CPU%

CPU utilization for five seconds: 14%/0%; one minute: 12%; five minutes: 12%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

136 192729000 124946322 1542 4.69% 4.83% 4.80% 0 Hulc LED Process

4 42236260 2366789 17845 2.09% 0.91% 0.74% 0 Check heaps

105 28574143 5405898 5285 0.49% 0.50% 0.49% 0 hpm counter proc

38 1857551 92006 20189 0.39% 0.04% 0.00% 0 Per-minute Jobs

165 1711405 1732079 988 0.29% 0.04% 0.00% 0 CDP Protocol

190 11049337 12822842 861 0.29% 0.27% 0.28% 0 Spanning Tree

146 357515 2173868 164 0.09% 0.01% 0.00% 0 HRPC qos request

145 6026652 1086949 5544 0.09% 0.10% 0.09% 0 HQM Stack Proces

230 144967 10841728 13 0.09% 0.01% 0.00% 0 DHCPD Receive

50 61081 1086929 56 0.09% 0.00% 0.00% 0 Compute load avg

97 464406 5405935 85 0.09% 0.01% 0.00% 0 Hulc ILP Alchemy

289 1306241 3795642 344 0.09% 0.00% 0.00% 0 IP SNMP

204 1871434 5405898 346 0.09% 0.05% 0.02% 0 PI MATM Aging Pr

291 19429080 4372748 4443 0.09% 0.00% 0.00% 0 SNMP ENGINE

33 477514 6143435 77 0.09% 0.01% 0.00% 0 Net Background

67 4676501 249650811 18 0.09% 0.07% 0.08% 0 RedEarth Rx Mana

66 7641949 38214727 199 0.09% 0.10% 0.11% 0 RedEarth Tx Mana

175 1326425 12100919 109 0.09% 0.04% 0.01% 0 IP Input

18 2198 5406010 0 0.00% 0.00% 0.00% 0 IPC Periodic Tim

17 0 1 0 0.00% 0.00% 0.00% 0 IPC Zone Manager

16 1566 90580 17 0.00% 0.00% 0.00% 0 IPC Dynamic Cach

15 0 1 0 0.00% 0.00% 0.00% 0 IFS Agent Manage

14 210 25 8400 0.00% 0.00% 0.00% 0 Entity MIB API

19 0 1 0 0.00% 0.00% 0.00% 0 IPC Managed Time

20 1760 5406010 0 0.00% 0.00% 0.00% 0 IPC Deferred Por

26 0 1 0 0.00% 0.00% 0.00% 0 License IPC serv

21 0 1 0 0.00% 0.00% 0.00% 0 IPC Seat Manager

13 0 1 0 0.00% 0.00% 0.00% 0 Policy Manager

29 0 2 0 0.00% 0.00% 0.00% 0 XML Proxy Client

30 0 1 0 0.00% 0.00% 0.00% 0 ARP Snoop

--More--

Thanks for any ideas you might have to investigate this further.

srikanth ath · ‎11-12-2012

Hi

Normal conditions for High cpu utilization:

•Spanning Tree

•IP Routing Table Updates

•Cisco IOS Commands

Kindly, refer the link which may help you.

http://www.google.co.in/#hl=en&sclient=psy-ab&q=troubleshooting+high+cpu+utilization+on+cisco+switches&oq=troubleshooting+the+CPU+utili&gs_l=hp.3.1.0i22l3.8548.16723.1.18298.31.22.0.9.9.1.266.4134.1j9j12.22.0.les%3B..1.0...1c.1.Q81AbMtPjjg&pbx=1&bav=...

Please rate helpful posts.

Regards,

Srikanth

nathanaelminesso · ‎11-12-2012

Hi,

I recommend you to go through the steps in the link below and do a cpu profiling when the situation is at its worst.

http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a00801c2af0.shtml

So you then at least know what exactly causes the high CPU utilization. I hope CEF is enabled because I see here:

"CPU utilization for five seconds: 14%/0%" that there are no interrupts currently present.