Re: high cpu on 3750X

mbausenwein · ‎10-29-2013

We recently deployed infoblox netmri. Soon after turning on the snmp discovery, we started getting alerts from our monitoring system saying 3750X stacks were not reachable. During We verified that during the time when we receive the alert, the switch is up and running. We see no entries in either the local log or syslog. We did notice that the CPU is running high on the devices.

PU utilization for five seconds: 76%/27%; one minute: 84%; five minutes: 82%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

153 2333824107 785570107 2970 17.59% 15.25% 15.70% 0 Hulc LED Process

275 64223357 198475886 323 7.03% 10.19% 10.15% 0 IGMPSN MRD

78 1693125150 337735065 5013 4.95% 4.71% 4.93% 0 RedEarth Tx Mana

77 699940609 519744636 1346 3.67% 3.97% 4.06% 0 RedEarth I2C dri

120 583335431 65548747 8899 2.23% 1.98% 2.08% 0 hpm counter proc

159 3353 144 23284 1.11% 3.06% 0.78% 1 SSH Process

121 76473668 308139842 248 0.63% 0.36% 0.35% 0 HRPC pm-counters

116 16979842 531252809 31 0.63% 0.30% 0.30% 0 hpm main process

344 85002178 304713568 278 0.63% 0.85% 0.88% 0 LLDP Protocol

110 29413324 300629687 97 0.47% 0.18% 0.13% 0 HRPC ilp request

212 51979424 145488932 357 0.47% 0.46% 0.50% 0 Spanning Tree

We are running c3750e-universalk9-mz.122-58.SE2 lan base code so we are only doing layer 2 on the switch.

The swtiches have about 6 vlans configured, and it has a port channel to 2 nexus, pretty vanilla config.

Jose Solano · ‎11-01-2013

Hi,

From the output I see that 2 processes running high:

1- Hulc LED Process which is in charge of the following tasks:

- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD)
detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status

The Catalyst 3750-X switches have a CPU utilization level that is higher than the previous models of the Catalyst 3750 switches. This is normal behavior.

2- IGMPSN MRD this is a common reason for high CPU utilization cause that the Catalyst 3750 CPU is busy with the processing storm of Internet Group Management Protocol (IGMP) leave messages

link:http://www.cisco.com/en/US/products/hw/switches/ps5023/products_tech_note09186a00807213f5.shtml#topic2

Link: http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/troubleshooting/cpu_util.html

Leo Laohoo · ‎11-01-2013

We are running c3750e-universalk9-mz.122-58.SE2   lan base code so we are only doing layer 2 on the switch.

I'd stay away from this particular IOS. Stick with 12.2(55)SE8 for better results.

maxrodri · ‎11-01-2013

Hi mbausenwein,

The issue with the Hulc LED process might be related to the bug CSCtn42790.

Like Leo Laohoo said, upgrade to that code (12.2(55)SE8 or later version) in order to fix this cosmetic issue (This process will not affect the traffic or switch usage).

mbausenwein · ‎11-04-2013

the thing that also concerns me is the amount of time that the CPU is spending on interrupt. The reading I have done says it should be at about 2%, but we are running at 27%. We even tried disabling igmp snooping. Doing that got us lower cpu on the overall, and down to about 14% interrupt CPU, but still scratching my head to see why interrupt processing is taking so much cpu......

Joseph W. Doherty · ‎11-04-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Check your TCAM usage. I believe, TCAM processing "overflows" into software (interrupt) processing.

If TCAM is overflowing, are you using the most suitable SDM template?

mbausenwein · ‎11-04-2013

based on the following output, it does not appear we are overflowing TCAM are we? I see plenty of head room......

1P-C2-IDF-01#sh platform tcam utilization

CAM Utilization for ASIC# 0 Max Used
Masks/Values Masks/values

Unicast mac addresses:                       6364/6364        781/781
IPv4 IGMP groups + multicast routes:         1120/1120         13/13
IPv4 unicast directly-connected routes:      6144/6144          0/0
IPv4 unicast indirectly-connected routes:    2048/2048         34/34
IPv4 policy based routing aces:               452/452          12/12
IPv4 qos aces:                                512/512          21/21
IPv4 security aces:                           964/964          36/36

Note: Allocation of TCAM entries per feature uses
a complex algorithm. The above information is meant
to provide an abstract view of the current TCAM utilization

Joseph W. Doherty · ‎11-04-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Yep, that looks fine.

As Leo suggested, moving to a more stable IOS might help.

What also might be happening, there's something in some of your traffic or something about your configuration forcing software processing, although this would be very unusual for just L2. I would first try changing the installed IOS.