cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2922
Views
0
Helpful
7
Replies

high cpu on 3750X

mbausenwein
Level 1
Level 1

We recently deployed infoblox netmri.  Soon after turning on the snmp discovery, we started getting alerts from our monitoring system saying 3750X stacks were not reachable.  During  We verified that during the time when we receive the alert, the switch is up and running.  We see no entries in either the local log or syslog.    We did notice that the CPU is running high on the devices.

PU utilization for five seconds: 76%/27%; one minute: 84%; five minutes: 82%

PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process

153  2333824107   785570107       2970 17.59% 15.25% 15.70%   0 Hulc LED Process

275    64223357   198475886        323  7.03% 10.19% 10.15%   0 IGMPSN MRD    

  78  1693125150   337735065       5013  4.95%  4.71%  4.93%   0 RedEarth Tx Mana

  77   699940609   519744636       1346  3.67%  3.97%  4.06%   0 RedEarth I2C dri

120   583335431    65548747       8899  2.23%  1.98%  2.08%   0 hpm counter proc

159        3353         144      23284  1.11%  3.06%  0.78%   1 SSH Process   

121    76473668   308139842        248  0.63%  0.36%  0.35%   0 HRPC pm-counters

116    16979842   531252809         31  0.63%  0.30%  0.30%   0 hpm main process

344    85002178   304713568        278  0.63%  0.85%  0.88%   0 LLDP Protocol 

110    29413324   300629687         97  0.47%  0.18%  0.13%   0 HRPC ilp request

212    51979424   145488932        357  0.47%  0.46%  0.50%   0 Spanning Tree

We are running c3750e-universalk9-mz.122-58.SE2   lan base code so we are only doing layer 2 on the switch.

The swtiches have about 6 vlans configured, and it has a port channel to 2 nexus, pretty vanilla config.

7 Replies 7

Jose Solano
Level 4
Level 4

Hi,

From the output I see that 2 processes running high:

1- Hulc LED Process which is in charge of the following tasks:

- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD)
detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status

The Catalyst 3750-X switches have a CPU utilization level that is higher than the previous models of the Catalyst 3750 switches. This is normal behavior.

2- IGMPSN MRD this is a common reason for high CPU utilization cause that the Catalyst 3750 CPU is busy with the processing storm of Internet Group Management Protocol (IGMP) leave messages

link:http://www.cisco.com/en/US/products/hw/switches/ps5023/products_tech_note09186a00807213f5.shtml#topic2

Link: http://www.cisco.com/en/US/docs/switches/lan/catalyst3750/software/troubleshooting/cpu_util.html


Leo Laohoo
Hall of Fame
Hall of Fame
We are running c3750e-universalk9-mz.122-58.SE2   lan base code so we are only doing layer 2 on the switch.

I'd stay away from this particular IOS.  Stick with 12.2(55)SE8 for better results.

maxrodri
Level 1
Level 1

Hi mbausenwein,

The issue with the Hulc LED process might be related to the bug CSCtn42790.

Like Leo Laohoo said, upgrade to that code (12.2(55)SE8 or later version) in order to fix this cosmetic issue (This process will not affect the traffic or switch usage).

the thing that also concerns me is the amount of time that the CPU is spending on interrupt.   The reading I have done says it should be at about 2%, but we are running at 27%.   We even tried disabling igmp snooping.  Doing that got us lower cpu on the overall, and down to about 14% interrupt CPU, but still scratching my head to see why interrupt processing is taking so much cpu......

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Check your TCAM usage.  I believe, TCAM processing "overflows" into software (interrupt) processing.

If TCAM is overflowing, are you using the most suitable SDM template?

based on the following output, it does not appear we are overflowing TCAM are we?   I see plenty of head room......

1P-C2-IDF-01#sh platform tcam utilization

CAM Utilization for ASIC# 0                      Max            Used
                                             Masks/Values    Masks/values

Unicast mac addresses:                       6364/6364        781/781
IPv4 IGMP groups + multicast routes:         1120/1120         13/13
IPv4 unicast directly-connected routes:      6144/6144          0/0
IPv4 unicast indirectly-connected routes:    2048/2048         34/34
IPv4 policy based routing aces:               452/452          12/12
IPv4 qos aces:                                512/512          21/21
IPv4 security aces:                           964/964          36/36

Note: Allocation of TCAM entries per feature uses
a complex algorithm. The above information is meant
to provide an abstract view of the current TCAM utilization

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Yep, that looks fine.

As Leo suggested, moving to a more stable IOS might help.

What also might be happening, there's something in some of your traffic or something about your configuration forcing software processing, although this would be very unusual for just L2.  I would first try changing the installed IOS.

Review Cisco Networking for a $25 gift card