Troubleshooting High CPU on 3750

InayathUlla Sharieff · ‎09-09-2014

Introduction

Troubleshooting Steps

Let us see how to troubleshoot High CPU Utilization on the DSBU switches.

Before troubleshooting HIGH CPU, we need to verify a few things.

1. What changes in the network/device might have triggered this issue?

Below are the respective commands which helps you to identify the same.

Show proc cpu sorted | ex 0.0
show proc cpu history
show controllers cpu-interface (this will help in finding the queue that is loading the CPU out of the 16 CPU Queues available on 3750. Once you find it out you have to use the “debug platform” command to troubleshoot further. CAUTION: “Debug Platform” is very cpu intensive. We need to use this command with caution. Always engage a Cisco Engineer before running this command. During peak hours.)
show platform port-asic stats drop ( To check Supervisor TX Queue Drop)
show controllers cpu-interface
show platform tcam utilization
show platform ip unicast counts

Debug

debug platform cpu-queues {broadcast-q | cbt-to-spt-q | cpuhub-q | host-q |
icmp-q | igmp-snooping-q | layer2-protocol-q | logging-q |remote-console-q |
routing-protocol-q | rpffail-q | software-fwd-q | stp-q} -----------> its intrusive
in nature and run only when you see drops in the queue.

2-

Unlike Catalyst 4500 and 6500, there is no sniffer trace (e.g. debug netdr, inband RP trace, debug platform packet all receive buffer) for Catalyst 2K/3K for traffic punted to CPU. It is pretty tedious to troubleshoot high CPU caused by interrupt. This guide shows you step-by-step procedure to troubleshoot the same.

Any traffic coming from or going to CPU is put into one of the 16 queues. Below is the mapping from queue 0 to queue 15
CPU Queue #      Description              Explanation
0                           rpc                           Remote procedure call; used by IOS processes to communicate across the stac
1                           stp                           Spanning tree
2                           ipc                           interprocess communication; used by IOS processesto communicate across the stack
3                           routing protocol       receive queue for routing protocol packets

4                           L2 protocol              queue for protocol packets such as LACP, UDLD, CDP, and etc
5                           remote console       queue used for “session <switch number>” is used to open console on switch members
6                           sw forwarding         Traiffc required software switched (e.g. unknown multicast, IP header with option)
7                           host                         Traffic to the switch including directed broadcast
8                           broadcast                broadcast packets (e.g. ARP, RIPv1)
9                           cbt-to-spt                 Packets of hitting a (*,G) entry that exceeding the stpthreshold.
10                         igmp snooping        Packets are placed on this queue as a result of hitting the IGMP entry
11                         icmp                        for ICMP redirect or ICMP un-reachable
12                         logging                    ACL excpetion
13                         rpf-fail                      multicast traffic fails RPF checking
14                         dstats                      drop stats. Unused during normal operation
15                         cpu heartbeat          CPU keepalive to check the health of CPU queues

This document will explain why queue 0 to 15 (not queue 1 to 16) are used.. When a traffic is placed into one of the queues above, a buffer is allocated to store the traffic temporarily. If the CPU utilization on interrupt level is high, it is usually caused by certain type of traffic.

Troubleshooting Methodology

1. Collect show controllers cpu-interface

We collect the output command show controllers cpu-interface multiple times. Below is an example of the show controllers cpu-interface:

cpu-queue-frames  retrieved  dropped    invalid    hol-block  stray
----------------- ---------- ---------- ---------- ---------- ----------
rpc               0          0          0          0          0        
stp               737164     0          0          0          0        
ipc               0          0          0          0          0        
routing protocol  1146606170 0          0          0          0        
L2 protocol       65643      0          0          0          0        
remote console    0          0          0          0          0        
sw forwarding     0          0          0          0          0        
host              5          0          0          0          0        
broadcast         19394      0          0          0          0        
cbt-to-spt        0          0          0          0          0        
igmp snooping     0          0          0          0          0        
icmp              0          0          0          0          0        
logging           0          0          0          0          0        
rpf-fail          0          0          0          0          0        
dstats            0          0          0          0          0        
cpu heartbeat     29100077   0          0          0          0

Look at the queue with the largest difference in retrieved. That's likely the source of traffic.

2. Think of a logical reason

From looking at the queue, try to think of a logical reason why traffic is punted to CPU. For example, configure no ip unreachables and no ip redirects on layer 3 interfaces if icmp queue (i.e. queue 11) has most traffic punt to CPU. For sw forwarding queue (i.e. queue 6), see Common Problems below.

3. Collect the output of show buffer pool RxQ<X> Packet

Collect this output only if necessary because it is time consuming to look at the show output. Collect the output of show buffer pool RxQ<X> packet, where X is the queue # with highest increments. For example, you want to look at the content of sw forwarding queue. Use command show buffer pool RxQ6 packet.

4. Check Input Queue Drops:

It is also useful to look at "show interface" and see layer 3 interfaces with huge number packets sitting in input queue.

These are packets going to CPU. You can dump the packets using the command "show buffer input-interface X dump" where X is the interface name , like vlan 10

3- Below are few common process utilizing the CPU:

Hulc LED:

The "Hulc LED" process does following tasks:
- Check Link status on every port
- If the switch supports POE, it checks to see if there is a Power Device (PD) detected
- Check the status of the transceiver
- Update Fan status
- Set Main LED and ports LEDs
- Update both Power Supplies and RPS
- Check on system temperature status

b) HIGH CPU DUE TO H13U BKGRD PROCESS ==============

Hl3u bkgrd process - manages quite a few background tasks like...
- Hardware Arp throttling house keeping tasks
- Retry adjs/Fib in case of an out of hardware resource condition
- sending out gratuitous arps in certain scenarios like master
switchover
- proxy arp house keeping functions
- ICMP redirect processing
- TTL ICMP error generation in some conditions
- handling correct route forwarding in an output acl full condition

c) High CPU due to SNMP:

# show snmp ===> capture taken few times with 5 min interval. (to verify the SNMP traffic)

If SNMP ENGINE process is consuming major CPU, then ONLY "show stack 318" will provide useful SNMP stack trace.
The below steps are useful only when we high CPU for SNMP ENGINE process.

Hope the above should help you. In case if you still wants help then you can raise the TAC case or post the thread and one of us would help you.

Regards

Inayath

habookans · ‎06-11-2018

https://supportforums.cisco.com/t5/network-management/cpu/m-p/3079496/highlight/false#M113815

this helped me, it may help you also