10-29-2013 09:33 AM - edited 03-07-2019 04:18 PM
We recently deployed infoblox netmri. Soon after turning on the snmp discovery, we started getting alerts from our monitoring system saying 3750X stacks were not reachable. During We verified that during the time when we receive the alert, the switch is up and running. We see no entries in either the local log or syslog. We did notice that the CPU is running high on the devices.
PU utilization for five seconds: 76%/27%; one minute: 84%; five minutes: 82%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
153 2333824107 785570107 2970 17.59% 15.25% 15.70% 0 Hulc LED Process
275 64223357 198475886 323 7.03% 10.19% 10.15% 0 IGMPSN MRD
78 1693125150 337735065 5013 4.95% 4.71% 4.93% 0 RedEarth Tx Mana
77 699940609 519744636 1346 3.67% 3.97% 4.06% 0 RedEarth I2C dri
120 583335431 65548747 8899 2.23% 1.98% 2.08% 0 hpm counter proc
159 3353 144 23284 1.11% 3.06% 0.78% 1 SSH Process
121 76473668 308139842 248 0.63% 0.36% 0.35% 0 HRPC pm-counters
116 16979842 531252809 31 0.63% 0.30% 0.30% 0 hpm main process
344 85002178 304713568 278 0.63% 0.85% 0.88% 0 LLDP Protocol
110 29413324 300629687 97 0.47% 0.18% 0.13% 0 HRPC ilp request
212 51979424 145488932 357 0.47% 0.46% 0.50% 0 Spanning Tree
We are running c3750e-universalk9-mz.122-58.SE2 lan base code so we are only doing layer 2 on the switch.
The swtiches have about 6 vlans configured, and it has a port channel to 2 nexus, pretty vanilla config.
11-01-2013 04:53 PM
Hi,
From the output I see that 2 processes running high:
1- Hulc LED Process which is in charge of the following tasks:
- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD)
detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status
The Catalyst 3750-X switches have a CPU utilization level that is higher than the previous models of the Catalyst 3750 switches. This is normal behavior.
2- IGMPSN MRD this is a common reason for high CPU utilization cause that the Catalyst 3750 CPU is busy with the processing storm of Internet Group Management Protocol (IGMP) leave messages
Link: http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/troubleshooting/cpu_util.html
11-01-2013 08:14 PM
We are running c3750e-universalk9-mz.122-58.SE2 lan base code so we are only doing layer 2 on the switch.
I'd stay away from this particular IOS. Stick with 12.2(55)SE8 for better results.
11-01-2013 08:25 PM
Hi mbausenwein,
The issue with the Hulc LED process might be related to the bug CSCtn42790.
Like Leo Laohoo said, upgrade to that code (12.2(55)SE8 or later version) in order to fix this cosmetic issue (This process will not affect the traffic or switch usage).
11-04-2013 05:22 AM
the thing that also concerns me is the amount of time that the CPU is spending on interrupt. The reading I have done says it should be at about 2%, but we are running at 27%. We even tried disabling igmp snooping. Doing that got us lower cpu on the overall, and down to about 14% interrupt CPU, but still scratching my head to see why interrupt processing is taking so much cpu......
11-04-2013 06:23 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
Check your TCAM usage. I believe, TCAM processing "overflows" into software (interrupt) processing.
If TCAM is overflowing, are you using the most suitable SDM template?
11-04-2013 06:27 AM
based on the following output, it does not appear we are overflowing TCAM are we? I see plenty of head room......
1P-C2-IDF-01#sh platform tcam utilization
CAM Utilization for ASIC# 0 Max Used
Masks/Values Masks/values
Unicast mac addresses: 6364/6364 781/781
IPv4 IGMP groups + multicast routes: 1120/1120 13/13
IPv4 unicast directly-connected routes: 6144/6144 0/0
IPv4 unicast indirectly-connected routes: 2048/2048 34/34
IPv4 policy based routing aces: 452/452 12/12
IPv4 qos aces: 512/512 21/21
IPv4 security aces: 964/964 36/36
Note: Allocation of TCAM entries per feature uses
a complex algorithm. The above information is meant
to provide an abstract view of the current TCAM utilization
11-04-2013 07:15 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
Yep, that looks fine.
As Leo suggested, moving to a more stable IOS might help.
What also might be happening, there's something in some of your traffic or something about your configuration forcing software processing, although this would be very unusual for just L2. I would first try changing the installed IOS.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide