cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1989
Views
0
Helpful
16
Replies

3560X high CPU/Interrupts because of ICMP queue?

wowzie
Level 1
Level 1

Hello,

 

We've a 3560X switch which is using a lot of CPU (around 80-90% CPU) and 25-30% interrupts.

 

Top 3 processes:

175 1060646641 3649393632 0 16.13% 14.93% 14.93% 0 Hulc LED Process
224 3956630454 3320689710 0 10.86% 10.97% 11.01% 0 IP Input
13 2497812332 3353714124 0 3.35% 3.77% 3.72% 0 ARP Input

 

I think it's caused by ICMP traffic flooding the CPU because "show controllers cpu-interface" shows me this output (I monitored it for 24 hours):

 

cpu-queue-frames, icmp: increasing with 5800 packets/sec

cpu-queue-frames, sw forwarding: increasing with 485 packets/sec

cpu-queue-frames, routing: increasing with 297 packets/sec

 

show buffers output:

RxQ11 buffers, 2040 bytes (total 16, permanent 16):
1 in free list (0 min, 16 max allowed)
2225720443 hits, 2500324642 misses

 

How can I find what is causing this? I found that I can use "debug platform cpu-queues icmp-q" but this will probably crash the switch which is not an option because the amount of traffic going through this switch.

 

Thanks,


Erik

16 Replies 16

Leo Laohoo
Hall of Fame
Hall of Fame

Post the complete output to the following commands: 

sh version
sh logs
sh proc cpu sort | ex 0.00

#sh proc cpu sort | ex 0.00
CPU utilization for five seconds: 71%/24%; one minute: 76%; five minutes: 82%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
175 1061116087 3649447988 0 14.23% 15.15% 15.65% 0 Hulc LED Process
224 3956956216 3320891212 0 12.15% 11.19% 11.13% 0 IP Input
13 2497921404 3353822980 0 4.15% 3.65% 3.67% 0 ARP Input
85 2415659242 571532433 4226 2.23% 1.76% 1.80% 0 RedEarth Tx Mana
178 1022797110 107440483 9519 1.75% 1.11% 1.10% 0 HL3U bkgrd proce
14 1188472142 164323555 7232 1.59% 1.16% 1.12% 0 ARP Background
91 1448298093 58537854 24741 1.59% 1.33% 1.41% 0 Adjust Regions
212 371093820 224918596 1649 0.95% 0.64% 0.64% 0 CEF: IPv4 proces
84 784236347 789874155 992 0.63% 0.97% 0.95% 0 RedEarth I2C dri
129 707665518 147989673 4781 0.63% 0.78% 0.79% 0 hpm counter proc
242 227879339 539345236 422 0.31% 0.10% 0.12% 0 Spanning Tree
223 116344653 4149362332 0 0.31% 0.06% 0.05% 0 IP ARP Retry Age
56 100671811 147991845 680 0.15% 0.04% 0.02% 0 Per-Second Jobs
190 278277891 29473918 9441 0.15% 0.19% 0.20% 0 HQM Stack Proces
290 32244963 184700818 174 0.15% 0.14% 0.15% 0 TCP Protocols
420 207838851 1220917627 170 0.15% 0.11% 0.14% 0 HSRP IPv4
86 19754962 394937080 50 0.15% 0.04% 0.04% 0 RedEarth Rx Mana
126 288853092 2015114174 143 0.15% 0.18% 0.16% 0 hpm main process
107 483417410 4149037853 0 0.15% 0.10% 0.06% 0 HLFM address lea

 

#show version
Cisco IOS Software, C3560E Software (C3560E-UNIVERSALK9-M), Version 15.2(4)E10, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2020 by Cisco Systems, Inc.
Compiled Tue 31-Mar-20 21:44 by prod_rel_team

ROM: Bootstrap program is C3560E boot loader
BOOTLDR: C3560E Boot Loader (C3560X-HBOOT-M) Version 12.2(58r)SE1, RELEASE SOFTWARE (fc1)

License Level: ipservices
License Type: Permanent
Next reload license Level: ipservices

cisco WS-C3560X-24 (PowerPC405) processor (revision M0) with 262144K bytes of memory.
Processor board ID FDO1852F1J3
Last reset from power-on
7 Virtual Ethernet interfaces
1 FastEthernet interface
28 Gigabit Ethernet interfaces
2 Ten Gigabit Ethernet interfaces
The password-recovery mechanism is enabled.

512K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address : 74:A2:E6:67:D0:80
Motherboard assembly number : 73-12554-12
Motherboard serial number : FDO185302LK
Model revision number : M0
Motherboard revision number : A0
Model number : WS-C3560X-24T-E
Daughterboard assembly number : 800-32786-02
Daughterboard serial number : FDO18520QZ6
System serial number : FDO1852F1J3
Top Assembly Part Number : 800-31331-09
Top Assembly Revision Number : F0
Version ID : V06
CLEI Code Number : CMMPW00DRA
Hardware Board Revision Number : 0x05


Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 30 WS-C3560X-24 15.2(4)E10 C3560E-UNIVERSALK9-M

 

show log only shows rows of:

%SEC_LOGIN-5-LOGIN_SUCCESS: Login Success

Nothing else.

You already running latest IOS, is this CPU high you seeing after any recent upgrade?

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

We upgraded last week from 15.0(2)SE7 to this version to see if this fixed the issue. We've this high CPU Problem for a few weeks now.

No logs? That's strange.
Try this:

  1. Disable (or shutdown) the ports to the downstream clients for about 2 minutes. 
  2. See if the CPU drops when the downstream ports are disabled. 

I'm looking for a way to find the cause without downtime..

I am having the same issues after upgrading to 15.2(4)E10. My switches cpu is 93-98%. Has anything been resolved?

 

"My switches cpu is 93-98%."

Doing what, exactly?

I dont know.. this is what i am seeing on the cpu processes. Our monitoring software is always showing high cpu utilization for these switches. I have seen this high utilization on switches with hardly any devices connected to it. I do have 1 switch that only has 2 vlans and its utilization is a lot lower, so i dont know if this is a traffic thing causing it or something else.. the other switches that this is happening on have about 10 vlans. Its exactly what Wowzie is describing above.

CPU utilization for five seconds: 91%/30%; one minute: 94%; five minutes: 95%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
175 122864891 11695364 10505 18.71% 16.54% 16.42% 0 Hulc LED Process
337 94543646 47980026 1970 13.59% 13.89% 14.42% 0 IGMPSN
336 71752206 45542865 1575 7.19% 9.79% 10.34% 0 IGMPSN MRD
84 35128976 4849807 7243 5.27% 4.65% 4.61% 0 RedEarth Tx Mana
83 34099663 7842599 4348 5.59% 4.51% 4.47% 0 RedEarth I2C dri
4 16189838 659116 24562 0.00% 1.71% 2.05% 0 Check heaps
129 8181575 472255 17324 1.11% 1.07% 1.11% 0 hpm counter proc
126 4735154 6201911 763 0.63% 0.55% 0.59% 0 hpm main process
190 3493717 93551 37345 0.47% 0.47% 0.47% 0 HQM Stack Proces
105 1702 126 13507 0.47% 1.66% 0.41% 1 SSH Process
240 1709813 1917425 891 0.31% 0.23% 0.23% 0 Spanning Tree
11 1638126 7843 208864 0.00% 0.19% 0.21% 0 Licensing Auto U
117 1423997 10708648 132 0.15% 0.23% 0.19% 0 HLFM address lea
85 1300877 3735185 348 0.31% 0.25% 0.18% 0 RedEarth Rx Mana
57 1290083 472661 2729 0.00% 0.16% 0.17% 0 Per-Second Jobs
332 1240785 1013532 1224 0.31% 0.16% 0.16% 0 Marvell wk-a Pow
222 1152807 6672356 172 0.00% 0.14% 0.15% 0 VRRS Main thread
355 1108000 6672177 166 0.00% 0.11% 0.13% 0 MMA DB TIMER
382 1122830 6672279 168 0.15% 0.10% 0.13% 0 MMA DP TIMER
164 863659 2245806 384 0.00% 0.09% 0.10% 0 Hulc Storm Contr
225 917102 10708396 85 0.15% 0.10% 0.10% 0 IP ARP Retry Age
455 620597 730518 849 0.00% 0.08% 0.09% 0 LLDP Protocol
383 1029793 13213914 77 0.00% 0.06% 0.09% 0 MMON MENG
191 624047 187176 3334 0.15% 0.10% 0.09% 0 HRPC qos request
95 680613 2353751 289 0.00% 0.09% 0.08% 0 yeti2_emac_proce
--More--

Reading https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3750/software/troubleshooting/cpu_util.html hard to say whether you have a real problem, or it's normal.  Further, whether there's anything you might do beyond trying another IOS variant.

Two things to keep in mind.

First, most data forwarding, on a switch, is accomplished on dedicated hardware.  I.e., a busy CPU matters little.

Second, usually CPU processes are prioritized, so a low priority, background process, consuming much, even all available CPU, is not detrimental to normal operations.  The biggest "issue" is general monitoring showing a very busy CPU.

The one thing, that's possibility of concern, is the high utilization of the two IGMP snooping processes.

What's your multicast environment like?

DarrelR
Level 1
Level 1

Reasonably low. I have a total of 4 vlans that are running multicast. most of the switches with high cpu dont have those vlans on them.

on one of the switches:

show ip mroute active

Active IP Multicast Sources - sending >= 4 kbps

on our whole network:

Active IP Multicast Sources - sending >= 4 kbps

Group: 239.192.4.226, (?)
Source: 10.0.60.32 (?)
Rate: 83 pps/64 kbps(1sec), 64 kbps(last 20 secs), 63 kbps(life avg)

Group: 239.192.4.227, (?)
Source: 10.0.60.32 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 20 secs), 18 kbps(life avg)

Group: 239.192.4.224, (?)
Source: 10.0.60.32 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 20 secs), 18 kbps(life avg)

Group: 239.192.4.225, (?)
Source: 10.0.60.32 (?)
Rate: 85 pps/65 kbps(1sec), 65 kbps(last 20 secs), 65 kbps(life avg)

Group: 239.192.4.228, (?)
Source: 10.0.60.32 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 20 secs), 18 kbps(life avg)

Group: 239.192.4.229, (?)
Source: 10.0.60.32 (?)
Rate: 19 pps/18 kbps(1sec), 18 kbps(last 20 secs), 18 kbps(life avg)

Group: 239.192.4.192, (?)
Source: 10.0.60.31 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 30 secs), 18 kbps(life avg)

Group: 239.192.4.193, (?)
Source: 10.0.60.31 (?)
Rate: 85 pps/65 kbps(1sec), 65 kbps(last 30 secs), 65 kbps(life avg)

Group: 239.192.4.194, (?)
Source: 10.0.60.31 (?)
Rate: 83 pps/63 kbps(1sec), 63 kbps(last 30 secs), 63 kbps(life avg)

Group: 239.192.4.195, (?)
Source: 10.0.60.31 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 30 secs), 18 kbps(life avg)

Group: 239.192.4.196, (?)
Source: 10.0.60.31 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 30 secs), 18 kbps(life avg)

Group: 239.192.4.197, (?)
Source: 10.0.60.31 (?)
Rate: 20 pps/18 kbps(1sec), 18 kbps(last 30 secs), 18 kbps(life avg)

Group: 239.192.109.160, (?)
Source: 10.0.75.102 (?)
Rate: 4 pps/6 kbps(1sec), 6 kbps(last 40 secs), 6 kbps(life avg)

Group: 239.192.109.128, (?)
Source: 10.0.75.101 (?)
Rate: 4 pps/6 kbps(1sec), 6 kbps(last 30 secs), 32 kbps(life avg)

Group: 239.192.109.129, (?)
Source: 10.0.75.101 (?)
Rate: 197 pps/173 kbps(1sec), 173 kbps(last 20 secs), 169 kbps(life avg)

 

 

Cannot say for sure, but if it's due to your multicast environment, it may not have anything to due with bandwidth of multicast flows, or number of such flows, but rather, perhaps, host IGMP activity.

Unless someone else replies with some useful information, I only see 3 options.  First, if there's no operational issues due to the high CPU, you just accept it.  Second, if you have Cisco support, open a case with TAC.  Third, retain a network consultant to come on-site to analyze the issue.

Thank you for your input on this. I am seeing my only option is to accept and move on. These switches are out of life, out of support, so i cant open a TAC case on them. I know they need to be replaced, but the budget doesnt allow me to replace all of them which is why i am trying to nurse these things as long as i can. 

Jan Rolny
Level 3
Level 3

Hi,

I think it's still the same old well-known bug related to that "Hulc LED Process". There are tens of discussions on cisco forum about same problem for this models of switches 3560X/3750X. That time when I was working with 3750X I had exactly same issue. 

https://bst.cisco.com/bugsearch/bug/CSCtn42790?rfs=qvlogin

Best regards,

Jan