cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
604
Views
0
Helpful
0
Replies

Suddenly high CPU utilization /high interrupts 2960X/3750X stack

the_kirschi
Level 1
Level 1

Hi all,

this will likely become a longer one...

We have quite a large network with more than 130 switches, some of them not always online but always more than 100. We have about 50 Vlans but still a lot of devices in Vlan1. We do not have any routers but use only Layer 3 switching for Inter-VLAN routing. As we have lots of video streaming and recording we have implemented IGMP snooping with a querier for each Vlan. We are about to split up the network into more Vlans and want to remove all devices from Vlan1 but it's still a way to go. We have several stacks with 2960(X) and 3750X switches.

Now for the problem:
As far as I can see the problem affects only one of the 3750X stacks (the most central one) and one 2960X stack. Other switches or stacks do not show any symptoms. On before said stacks we have a permanently high CPU utilization between 98% and 100% since 5 days now and there is also a high number of CPU interrupts of about 15%. The high CPU doesn't seem to have any impact on network availability or functionality. We had no complaints from users and I think we wouldn't even have noticed the problem if not our monitoring system had sent us messages about it.

Here is the output from both stacks "show proc cpu so | exc 0.0":

3750X Stack (IOS 15.0(2)SE6):

CPU utilization for five seconds: 98%/16%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
13 764111893 438222763 1743 51.00% 49.99% 50.41% 0 ARP Input
248 132756984 91327157 1453 4.43% 4.23% 4.20% 0 Spanning Tree
89 119457785 49551708 2410 2.69% 3.64% 3.74% 0 RedEarth I2C dri
5 69003518 2936830 23496 2.69% 1.15% 1.15% 0 Check heaps
180 86745498 81172271 1068 2.21% 3.49% 3.58% 0 Hulc LED Process
230 63699014 125432280 507 1.90% 1.98% 1.97% 0 IP Input
136 13254250 12529739 1057 0.95% 0.55% 0.55% 0 HRPC pm-counters
131 10849136 53731654 201 0.95% 0.55% 0.57% 0 hpm main process
414 6905728 15241308 453 0.79% 0.35% 0.31% 0 VLAN Manager
266 18511734 3356276 5515 0.79% 0.73% 0.69% 0 PI MATM Aging Pr
267 4808584 34122543 140 0.47% 0.35% 0.37% 0 UDLD
424 26690024 20261809 1317 0.47% 0.30% 0.31% 0 SNMP ENGINE
197 14473335 5381351 2689 0.31% 0.39% 0.37% 0 HRPC qos request
420 3669294 10039413 365 0.31% 0.18% 0.13% 0 DHCPD Receive
59 2894661 3428513 844 0.31% 0.15% 0.14% 0 Per-Second Jobs
218 4511475 4840468 932 0.15% 0.24% 0.19% 0 CEF: IPv4 proces
190 1138 186 6118 0.15% 0.89% 0.26% 1 SSH Process
99 10480646 23309455 449 0.15% 0.15% 0.15% 0 hrpc <- response
91 4046792 24426885 165 0.15% 0.11% 0.14% 0 RedEarth Rx Mana
216 20646505 6295098 3279 0.15% 0.56% 0.54% 0 CDP Protocol
196 6743567 680965 9902 0.15% 0.18% 0.19% 0 HQM Stack Proces
46 4445122 1122708 3959 0.15% 0.15% 0.15% 0 Net Background

2960X Stack (IOS  15.0(2)SE6):

CPU utilization for five seconds: 99%/10%; one minute: 99%; five minutes: 99%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
12 151501727 4442153 34105 30.31% 29.31% 29.17% 0 ARP Input
221 131704865 4383039 30048 27.11% 27.21% 27.39% 0 HULC DAI Process
213 22462250 2298213 9773 3.90% 3.91% 3.95% 0 Spanning Tree
104 17382433 3063456 5674 3.80% 3.59% 3.62% 0 HLFM address lea
4 2814681 138944 20257 0.70% 0.45% 0.53% 0 Check heaps
162 597 180 3316 0.50% 0.41% 0.15% 1 SSH Process
200 1248978 3315940 376 0.39% 0.25% 0.22% 0 IP Input
351 817745 1422158 575 0.29% 0.12% 0.13% 0 VLAN Manager
229 2589313 398358 6499 0.29% 0.45% 0.45% 0 PI MATM Aging Pr
167 1249485 84679 14755 0.19% 0.26% 0.25% 0 HQM Stack Proces
355 96695 11248 8596 0.19% 0.23% 0.23% 0 SNMP ENGINE
121 1314659 2498780 526 0.19% 0.23% 0.24% 0 hpm main process
339 63498 39923 1590 0.19% 0.10% 0.10% 0 IP SNMP

The 2960X stack has been rebooted already but shortly after reboot the CPU was back at 100%. Right after the reboot there were some "Failed to send hrpc non blocking message" messages. I don't know if they are related and couldn't find what they mean.

Today during troubleshooting we activated "ip arp inspection" on the 3750X stack (or was it only debug?) and shortly afterwards many of the uplinks went to err-disabled while many of "%SW_DAI-4-DHCP_SNOOPING_DENY: 1 Invalid ARPs" messages were logged.

I don't know what else I should write except that the issue started simultaneously on both stacks. However both are not connected to each other directly but with other switches/stacks in between which do not have the problem. I can't find the root cause of the problem. I have looked for spanning-tree BLK ports and any loops and anything else I can think of or I found on the internet but to no avail.

Any help with this is greatly appreciated. If you need more information from me, please let me know.

Thanks
Daniel

0 Replies 0
Review Cisco Networking for a $25 gift card