cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
22084
Views
14
Helpful
33
Replies

Ask the Expert:Troubleshooting High CPU and Other Issues in the Cisco Catalyst 4500 Series Switches

ciscomoderator
Community Manager
Community Manager

With Nickolay Karpyshev and Ivan Shirshin

Read the bioRead the bioWelcome to the Cisco Support Community Ask the Expert conversation. This is an opportunity to learn and ask questions about architecture and troubleshooting of Industry's Most Widely Deployed Modular Access Platform Cisco Catalyst 4500 with Cisco Experts Nikolay Karpyshev and Ivan Shirshin.

Nikolay and Ivan are Customer Support Engineers in the high touch technology support team (HTTS) at Cisco specialized in LAN Switching and Routing. They support the Cisco Switches Nexus 7000, Catalyst 6500, 3750, 3560, 4500, 2900 and variety of routing platforms, and work as senior and escalation engineers. Both Nikolay and Ivan were previously a part of Cisco Sales Associate program. They hold Cisco Certifications: CCNP, CCSP, and CCDP.

Remember to use the rating system to let Nikolay and Ivan know if you have received an adequate response. 

Nikolay and Ivan might not be able to answer each question due to the volume expected during this event. Remember that you can continue the conversation on the Network Infrastructure  sub-community discussion forum shortly after the event. This event lasts through through October 19, 2012. Visit this forum often to view responses to your questions and the questions of other community members.

33 Replies 33

Hi Ivan,

Thanks for the reply,

Q.1 Can you please explain what you mean when you say 7200 Router is CPU based? Do you mean the speed?

Q.2 What is  hardware forwarding?

Q.3 Also I wanted to know that in Cisco Routers ,do we need memories to run IOS only or also the commands that we use on the routers need memories to be executed?

Regards,

Hi Fahad,

Let me answer your questions:

Q.1 Can you please explain what you mean when you say 7200 Router is CPU based? Do you mean the speed?

There are two main technologies used in routing and switching - Hardware switching and Software switching. CPU controls the software part however as it also controls multiple processes within device it was decided to take the switching part of from it. It is done based of specific ASIC (engines) built on routing and switching processors. CPU is still controlling all the processes and protocols but it passes all routing and switching information down to those HW engines. Thus whenever the packet is coming and if device is capable of HW switching it is passing through this HW engine for forwarding decision. So CPU stay available for other tasks. There are still certain kind of packets which are managed by CPU always but the overall level is much lower. 7200 router architecture does not support any HW engines thus all the forwarding decisions are done by CPU. In comparison 7600 moving most of forwarding decision to feature cards on Supervisor and Line cards keeping CPU free from it.

As this topic about 4500 - on it there is CEF protocol which is controlling all aspects of Hardware forwarding (actually this protocol is used on multiple platforms to control Harware switching and interface with control plane)

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/31sga/configuration/guide/cef.html

Q.2 What is hardware forwarding?

I hope that is answered above. To add is that different platforms have different type of HW forwarding engines.

Q.3 Also I wanted to know that in Cisco Routers ,do we need memories to run IOS only or also the commands that we use on the routers need memories to be executed?

Not sure if I understood your question correctly but there are different types of memories on each platform. If we take 7600 for example it has few different types: NVRAM (primary to store config), RAM (store current processes, traffic, IOS), EPROM (storing ROMMON) and certain permanent flash partitions to store files (IOS image, crashinfo, etc)

when you check "show file system" - you can see different permanent file storage's like bootflash: and sup-bootflash:. You  can use "dir" command to check the available files on it. E.G. dir bootflash:

If you want to check the usage of operative memory you can use command "show process memory" - it is showing you the total size, usd and free size. It also includes the processes using the memory and the amount each holds.

You can check following document for more info:

http://www.cisco.com/en/US/partner/docs/ios/12_1/configfun/configuration/guide/fcd204.html

Hope it helps.

Let me know if you have any further questions to discuss.

Nik

HTH,
Niko

Hi Akhtar,

Sorry for  delay with reply, somehow your notification for your post gone missing.

This is really a topic for open discussion as it is hard to make some recomendation without having specific requirements.

From some of the Risk Assessments that I have seen done for 4500 and 4900 switches,  15.0(2)SG2 was considered much more stable release than 12.2(54)SG1 with many serious bugs fixed.

Speaking about the CPU specifically, the latest in this branch is 15.0(2)SG5 and I see only one bug not resolved in it that is relates to CPU:

- CSCtz04599 MU: Cat4500: dot1x fail - MAB success - dot1x fail leads to High CPU

It will be fixed in SG6 that is expected in November.

Anyway it is a subject for discussion and I will also appreciate if other guys will share their best practices.

Kind Regards,
Ivan Shirshin

**Please grade this post if you find it useful.

Kind Regards,
Ivan

sr1482613
Level 4
Level 4

Hello, Nickolay and Ivan !

This is Hank from ISONET.

I'm a systems engineer and have been assigned to support my customer who are operating a C4K that has high CPU utilization issue.

I have  a question about high cpu utilization for you guys.

Actually, I already opend a tac case and got the answer from a tac engineer.

The TAC engineer said it's performance issue, so I might have to change a network design.

Here is a log that i gave the tac engineer.

- show proc cpu sorted | ex 0.00

XXX#show proc cpu sorted | ex 0.00

CPU utilization for five seconds: 96%/3%; one minute: 91%; five minutes: 64%

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

106  30357545111460547752       2078 57.63% 53.56% 32.34%   0 IP Input

  55  1994448688 896053981       2225 19.10% 19.58% 12.01%   0 Cat4k Mgmt LoPri

  54  26567903491005451955       2642 13.50% 11.92% 13.80%   0 Cat4k Mgmt HiPri

  43    46356722   2103973      22032  0.95%  0.11%  0.06%   0 Per-minute Jobs

177        6004     10025        598  0.55%  0.45%  0.92%   1 Virtual Exec

  39   239320691  60960958       3925  0.31%  0.33%  0.31%   0 IDB Work

210     51910933756189628          0  0.23%  0.21%  0.22%   0 HSRP Common

  99    50335097  30310150       1660  0.23%  0.09%  0.08%   0 CDP Protocol

  88     1528720 609626516          2  0.07%  0.09%  0.08%   0 UDLD

211    29795068 380029744         78  0.07%  0.04%  0.05%   0 HSRP IPv4

  14    36539866 111072493        328  0.07%  0.06%  0.07%   0 ARP Input

113     5869671  75026899         78  0.07%  0.14%  0.15%   0 Spanning Tree

As per the information, I suspect the traffic punt to cpu lead to high cpu.

I can see the highest cpu utilization is IP Input Process, it means too many packets punt to cpu.

Here is other logs.

K2CpuMan Review       30.00  32.28     30     83  100  500   37  35    7  54119:03

K2AccelPacketMan: Tx  10.00  13.43     20      0  100  500   13  12    3  20377:35

As per the platform cpu packet statistics info , I can see that packets drop occurred on

L3 RX Low .

Packets Dropped by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

Host Learning                     2225         0         0         0          0

L2 Fwd Low                        2610         0         0         0          0

L3 Rx Low                     50680864        66        78        37          3

Packets Dropped by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

Host Learning                     2225         0         0         0          0

L2 Fwd Low                        2610         0         0         0          0

L3 Rx Low                     50681748        22        66        36          3

Packets Received at CPU per Input Interface

Interface              Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

Gi2/1                              233         4         0         0          0

Gi2/2                               41         0         0         0          0

Gi2/3                               32         0         0         0          0

Gi2/6                            77232      2217       685        81          0

Gi3/1                               39         0         0         0          0

Gi3/7                               25         0         0         0          0

Gi3/9                               26         0         0         0          0

Gi3/11                               1         0         0         0          0

Gi3/14                               1         0         0         0          0

Gi3/47                             243         6         0         0          0

Gi3/48                               7         0         0         0          0

Packets Received at CPU per Input Interface

Interface              Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

Gi2/1                              285         3         0         0          0

Gi2/2                               56         0         0         0          0

Gi2/3                               41         0         0         0          0

Gi2/6                            96543      1823       788        81         16

Gi3/1                               52         0         0         0          0

Gi3/7                               27         0         0         0          0

Gi3/9                               28         0         0         0          0

Gi3/11                               1         0         0         0          0

Gi3/14                               1         0         0         0          0

Gi3/16                               4         0         0         0          0

Gi3/47                             311         5         0         0          0

Gi3/48                               8         0         0         0    

I can see these count increase quckly, And packets increase highly input interface is Gi2/6.

When I captured the traffics, the traffics punting to CPU was broadcast including stock market price from outside of a comapny through the wan in real time, so I couldn't kill the traffics.

Is there any solution to reduce the high cpu utilization ?

if you want to get the service request number to solve it, you can refer to the serviec request # 622799643.

That SR was already closed.

thanks.

Hi Hank,

Well I see a good analysis already don on this case. So you found that broadcast packets causing this High CPU. All switches and routers are designed that way to check L3 broadcast in CPU to understand if any action should be taken on that. This is default behavior for most devices.

In your case broadcast hitting the switches seems to be destined to some other devices and 4500 should just forward it. As there no way to remove that traffic from network you can configure 4500 to limit the number of broadcast sent toward CPU by Control Plane Policing. This is a tool which inspecting the traffic heading CPU with set of pre-configured ACLs and dedicate only certain bandwidth to each class of traffic dropping the one which is going over limit.

Tus you configure basic CoPP template and create your own access-lists limiting certain traffic (e.g. broadcast) to certain boundaries. Broadcast will still be forwarded correctly in HW to all the port within broadcast domain but the part of it hitting the CPU will be limited to certain rate you configure thus keeping CPU safe.

You can consider following page on how to implement it

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/54sg/configuration/guide/cntl_pln.html

Let us know if you have any further questions.

Nik

HTH,
Niko

Hi Nik,

What are the broadcast threshold values in bps and pps which can be used in CoPP so as not to kill the CPU.

Regards,

Akhtar

Hi Akhtar,

I have seen in many cases people using "police 32000 1000" but I recommend to tune the values to your specific setup, as better solution is to do some  testing in your specific scenario on optimal thresholds.

Kind Regards,
Ivan Shirshin

**Please grade this post if you find it useful.

Kind Regards,
Ivan

kthned
Level 3
Level 3

Hi Experts,

I would like to take this forum as an oppurtunity to understand high CPU usage on one of our 6500 switch running IOS 12.2(33)SXI3. The problem is the Switch processor is taking too much CPU compare to Route processor. Some times the switch processor reaches to 99% . The command "sh process cpu " shows following two process on the top (NDE - IPV4, Spanning-Tree). Could you tell what is the way to root cause such issue. sh process CPU files are attached.

Not too sure if high SP is normal . As I use to see the route processor cpu usage by sh proc cpu. I came across to know high SP CPU using SNMP walk.

[nnmserver ~]$ snmpwalk -v2c -c public aaa-ddd 1.3.6.1.4.1.9.9.109.1.1.1.1.8

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.1 = Gauge32: 7

SNMPv2-SMI::enterprises.9.9.109.1.1.1.1.8.3 = Gauge32: 68

Thanks !

Regards,
Umair

Hi,

It seems that there is some traffic being sent to SP CPU as it is high on interrupts - 46%:

     switch_6500#remote command switc sh process cpu sorte

     CPU utilization for five seconds: 61%/46%; one minute: 69%; five minutes: 68%

      PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

      281  11313742443353170351          0  8.00%  7.01%  7.05%   0 Spanning Tree

      253  1146778924  48986767      23410  2.15%  1.81%  1.78%   0 Vlan Statistics

      470  1427548824 109539080      13032  1.35%  4.65%  4.34%   0 NDE - IPV4

Do you have any issues with stability in the network or spanning tree?

Please send the "show tech" and "show spannning-tree summ".

Kind Regards,
Ivan Shirshin

**Please grade this post if you find it useful.

Kind Regards,
Ivan

Hi Ivan

Thanks for your input & consideration.

Here is the sh tech & sh span summary output. please note that IP address and domains names are renamed to arbitrary.

Regards,

Umair

Hi,

Spanning tree is fine but I do see some log statements in the ACL (and such packets are sent to CPU for accounting).

     access-list 2460 permit tcp yy.24.16.0 0.7.239.255 host xxx.225.53.82 range 137 138 log

     ...

     access-list 2460 permit udp yy.24.16.0 0.7.239.255 host xxx.225.53.82 eq netbios-ss log

     ...

     access-list 2460 permit udp yy.24.16.0 0.7.239.255 host xxx.225.53.82 eq 445 log

     access-list 2460 permit tcp yy.24.16.0 0.7.239.255 host xxx.225.53.83 range 137 138 log

Interruts could be seen due to that ACL logs, or some functions constantly using CPU resources  - usually due to bugs - or due to traffic hitting the SP CPU.

Lets check the functions first by doing profiling (not service impacting). To prepare correct procedure, please provide me the outputs:

! login to the SP

switch# remote login switch

switch-sp# show region

switch-sp# show mem stat

switch-sp#exit

switch#

Kind Regards,
Ivan Shirshin

**Please grade this post if you find it useful.

Kind Regards,
Ivan

Thanks for the help Ivan. Here is the output for show region and sho mem stat

Hi,

Please following this procedure for the CPU profiling (to identify the functions responsible for interrupts):

1. Setup the profile

# profile 40101328 42247FFF 4

# profile task interrupt

2. Now run the following command and don't do anything on the router for about 5 minutes. You can inform all people logged in to leave the router alone by using "send *". If not left alone, the results of the profiling could become corrupted due to the CPU processing user commands.

# profile start

3. After waiting for about 5 minutes run the following command

# profile stop

4. Next run the following 4 commands in sequence, via TELNET. Note that these commands may generate a large amount of data. Do NOT attempt to do this via the console port, since the console port is slow and does not obey flow control, so data may be lost.

# terminal length 0 // Turn off the "more" page scrolling feature

# show profile terse

# show profile detail

# terminal length 40 // Turns the "more" page scrolling feature back on

5. Finally run the following to release the memory.

# clear profile

# unprofile 40101328 42247FFF 4

6. After that, please send me the output of the followings :

# show processes cpu

# show memory statistics

# show region

# show alignment

Kind Regards,
Ivan Shirshin

**Please grade this post if you find it useful.

Kind Regards,
Ivan

Hi Ivan

Thanks for the input. I shall update you on Monday as weekend has already started here in EU . I hope you continue help us in this regards next week. Thanks a lot and have a nice weekend !

Regards,

Umair

sbertsch
Level 1
Level 1

Experts,

I have an odd issue on c4503 w/sup iv running 12.2(50)SG7 where CPU is apparently being driven by an ESP flow that is being forwarded by CPU for no apparent reason.

I used CPU SPAN to capture CPU tx & rx traffic.   All other traffic appears normal with no smoking gun (e.g. ICMP traffic).   The only difference in the ESP packets for rx vs tx is TTL decrement & checksum updates.

See attached image of one of the ESP frames captured from the CPU SPAN.

#show processes cpu

CPU utilization for five seconds: 99%/0%; one minute: 96%; five minutes: 96%

PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process

  48  3887011340  80841077      48082 17.09%  8.78%  8.22%   0 Cat4k Mgmt HiPri

  49  42937403361140819070       3763 16.53% 24.29% 24.76%   0 Cat4k Mgmt LoPri

  97   9055618683276278949        276 50.71% 55.99% 55.30%   0 IP Input

192         800        92       8695 13.17%  1.05%  0.21%   1 SSH Process

#show platform health

                     %CPU   %CPU    RunTimeMax   Priority  Average %CPU  Total

                     Target Actual Target Actual   Fg   Bg 5Sec Min Hour  CPU

K2CpuMan Review       30.00  25.76     30     25  100  500   33  32   20  78196:03

K2AccelPacketMan: Tx  10.00   3.68     20      1  100  500    2   2    2  26900:32

K2PortMan Review       3.00   2.79     15     11  100  500    2   2    1  19009:51

K2Fib Consistency Ch   1.00   8.34      5      3  100  500    9   2    1  19781:29

K2PacketBufMonitor-P   3.00   2.00     10      1  100  500    2   2    1  26238:29

%CPU Totals          214.80  47.58

#show platform cpu packet statistics

Packets Received by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

L3 Rx Low                  17404309562      2642      3271      2534       2008

Packets Dropped by Packet Queue

Queue                  Total           5 sec avg 1 min avg 5 min avg 1 hour avg

---------------------- --------------- --------- --------- --------- ----------

L3 Rx Low                    319042488      1179      1226       734        409

#show platform hardware ip route summary

TCAM running in 144 bit mode. (16 routes per block)

5525 blocks used out of 8192 (67.44%)

87906 K2Fib TCAM entries used out of 131072 (67.06%)

(512 entries are fixed overhead)

294 K2FibAdjs used out of 32768 (0.89%)

87394 IrmFibEntries used out of 262144 (33.3333%)

5 IrmMfibEntries used out of 65536 (0.00%)

281 IrmFibAdjs used out of 49152 (0.57%)

K2FibAdj allocation failures: 0

K2FibEntry allocation failures: 0

K2FibRegion block reshuffles: 0

IrmFibAdj allocation failures: 0

Number of Entries using RpfFloodSet:0

VRF Vlans using software forwarding due to resource exhaustion: 0

Consistency Checker failures:

  reported:  cam: 0 mask: 0 fte: 0

  suppressed:  cam: 0 mask: 0 fte: 0

Review Cisco Networking for a $25 gift card