cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9126
Views
8
Helpful
18
Replies

Catalyst 3850 Series Switch High CPU Usage Troubleshoot

jackson.ku
Level 3
Level 3

Hi,

We have one Catalyat 3850 switch with High CPU utilization. I analysis "show proc cpu" result, the root cause of high CPU utilization is "FED" process.

 

I found the following documention mention how to troubleshoot the issue :

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/117594-technote-hicpu3850-00.html

 

There are many show command, debug command and set commnad to enable detail tracking during troubleshooting step, does it cause any performance impact for the switch

 

Best Regards,

 

Jackson Ku

 

18 Replies 18

Leo Laohoo
Hall of Fame
Hall of Fame

Jackson, 

 

Can you post the output to the command "sh proc cpu sorted | ex 0.00"?

Hi,

The show result :

switch#sh proc cpu sorted

Core 0: CPU utilization for five seconds: 7%; one minute: 8%;  five minutes: 8%

Core 1: CPU utilization for five seconds: 2%; one minute: 7%;  five minutes: 6%

Core 2: CPU utilization for five seconds: 4%; one minute: 11%;  five minutes: 8%

Core 3: CPU utilization for five seconds: 99%; one minute: 98%;  five minutes: 9 3%

PID    Runtime(ms) Invoked  uSecs  5Sec     1Min     5Min     TTY   Process

5672   2559796     11225434 155    26.17    25.78    25.74    1088  fed

10187  2009764     10763907 145    1.72     1.72     1.62     34816 iosd

 

 

 

switch#sh proc cpu  detailed process fed sorted | ex 0.0

Core 0: CPU utilization for five seconds: 7%; one minute: 11%; five minutes: 10%

 

Core 1: CPU utilization for five seconds: 6%; one minute: 11%; five minutes: 12%

 

Core 2: CPU utilization for five seconds: 1%; one minute: 26%; five minutes: 24%

 

Core 3: CPU utilization for five seconds: 98%; one minute: 65%; five minutes: 68%

PID    T C  TID    Runtime(ms) Invoked uSecs  5Sec      1Min     5Min     TTY

 

Process

                                               (%)       (%)      (%)

 

 

5672   L           3033716     1122951 155    25.76     25.69   25.75   1088  fed

5672   L 3  10690  333099      1177069 0      24.39     24.32   24.41   0     PunjectRx

5672   L 1  6114   2387317     2458453 0      0.49      0.49    0.49    0     fed-ots-nfl

5672   L 1  6137   3616334     1229897 0      0.39      0.16    0.14    0     IntrDrv

5672   L 0  9630   3461270     3918786 0      0.24      0.26    0.26    0     Xcvr

5672   L 2  10691  3440665     5439261 0      0.15      0.19    0.19    0     PunjectTx

5672   L 1  6111   1258766     3461592 0      0.10      0.22    0.22    0     fed-ots-main

 

 

 

switch#show platform punt client

 

  tag      buffer        jumbo    fallback     packets   received   failures

                                            alloc   free  bytes    conv  buf

 27       0/1024/2048     0/5       0/5        0     0          0     0     0

 65536    0/1024/1600     0/0       0/512  100872100 100872100 1883447828     0

 

    0

 65537    0/ 512/1600     0/0       0/512  840586 840586  237664318     0     0

 65538    0/   5/5        0/0       0/5        0     0          0     0     0

 65539    1/2048/1600     0/16      0/512  35974765 35974764 3935497384     0

 

  0

 65540    0/ 128/1600     0/8       0/0        0     0          0     0     0

 65541    0/ 128/1600     0/16      0/32   461315396 461315396 3398385267     0

 

    0

 65542    0/ 768/1600     0/4       0/0    1758007 3591582  158315634     0

 

0

 65544    0/  96/1600     0/4       0/0        0     0          0     0     0

 65545    0/  96/1600     0/8       0/32       0     0          0     0     0

 65546    0/ 512/1600     0/32      0/512  318361209 318361209 4044676185     0

 

    0

 65547    0/  96/1600     0/8       0/32       0     0          0     0     0

 65548    0/ 512/1600     0/32      0/256    277   277      17360     0     0

 65551    0/ 512/1600     0/0       0/256     23    23       1515     0     0

 65556    0/  16/1600     0/4       0/0        0     0          0     0     0

 65557    0/  16/1600     0/4       0/0        0     0          0     0     0

 65558    0/  16/1600     0/4       0/0        0     0          0     0     0

 65559    0/  16/1600     0/4       0/0        0     0          0     0     0

 65560    0/  16/1600     0/4       0/0        0     0          0     0     0

 65561    0/ 512/1600     0/0       0/128  529028955 593358770 2412522796     0

 

    0

 65563    0/ 512/1600     0/16      0/256      0     0          0     0     0

 65564    0/ 512/1600     0/16      0/256      0     0          0     0     0

 65565    0/ 512/1600     0/16      0/256      0     0          0     0     0

 65566    0/ 512/1600     0/16      0/256      0     0          0     0     0

 65581    0/   1/1        0/0       0/0        0     0          0     0     0

 131071    0/  96/1600     0/4       0/0        0     0          0     0     0

fallback pool: 0/1500/1600

jumbo pool:    0/128/9300

Hi,

Attached file is "show mgmt-infra trace message fed-punject-detail" result, please help to troubleshoot the issue.

Best Regards,

Jackson

Hi,

Attached files is the pcap file I used "monitor capture mycap control-plane in" command to captured the packets receive by CPU. Please help to to troubleshoot the issue.

Best Regards,

Jackson Ku

Hi Jackson.ku

 

Where you able to solve your issue? Seems we are facing the same as described by you.

Greetings, Erich

Hi,

We have raised TAC case and TAC engineer mention me it is known bug CSCuo98789. We upgrade to 3.3.5SE to resolve this issue.

Best Regards,

Jackson

Hi Jackson,

Thanks for your reply. The mentioned bug is not public available. But I have found the following int the release notes 3.3.x:

CSCuo98789

2

ARP broadcast for vlan which is not SVI punted to CPU incase of Layer 2

 

My colleagues will try to update the switch as fast as possible. Sure it will solve the issue.

Greetings, Erich

Hi everyone, can you tell me 

CAT3850/3650 UNIVERSAL  Login Required 
cat3k_caa-universalk9.SPA.03.07.02.E.152-3.E2.bin 

will it solve our problem? 

 

and what does it mean ? 

show mgmt-infra trace messages fed-punject-detail 

[09/14/15 19:06:56.576 UTC 1 6044] PUNT PATH (fed_punject_rx_retrieve_packet:1014):%RX Error: dropping packet in queue=1 buffer-len=192 FD_CONVERSION_FAIL

Hi,

    i'm facing High CPU utilization in our stack switches,here I 've posted the out put of the switches.

Please advice how to reduce the CPU utilization and what need to be done to normalize.

Thanks in advance

SW1#show processes cpu | ex 0.00
CPU utilization for five seconds: 32%/22%; one minute: 41%; five minutes: 46%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
9 36995430 42139498 877 1.11% 1.75% 2.59% 0 ARP Input
203 1810068 4562646 396 0.47% 0.21% 0.17% 0 DHCP Snooping
207 18016262 47497509 379 2.39% 1.92% 1.94% 0 IP Input
247 780446 2900357 269 0.15% 0.06% 0.05% 0 HULC DHCP Snoopi
257 1522104 4299712 354 0.15% 0.10% 0.11% 0 DHCPD Receive
287 179 8240 21 0.63% 0.19% 0.04% 1 Virtual Exec


SW1#show version
Cisco IOS Software, C3750 Software (C3750-IPBASEK9-M), Version 12.2(55)SE8, RELEASE SOFTWARE (fc2)

Switch Ports Model SW Version SW Image
------ ----- ----- ---------- ----------
* 1 12 WS-C3750G-12S 12.2(55)SE8 C3750-IPBASEK9-M
2 28 WS-C3750G-24TS-1U 12.2(55)SE8 C3750-IPBASEK9-M

SW1#show processes cpu history

3333333333333333333333333333333333333333333333333333333333
0000000111112222299999888889999111110000011111333336666677
100
90
80
70
60
50
40 ************** *******
30 **********************************************************
20 **********************************************************
10 **********************************************************
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)

3445565565557885455458857765555545445656557975955657747855
9375269964412342775782228861000660998673201144931133061372
100 *
90 * *
80 ** ** ** * * *
70 * * *#* ** *#* * *#* * ** **
60 * **** *## ** *## *#* * **** ##* # * #* *#*
50 **#####***###**#**###*##**********####**###*#######*##*#
40 **########################################################
30 ##########################################################
20 ##########################################################
10 ##########################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per minute (last 60 minutes)
* = maximum CPU% # = average CPU%
1 1 1 1 1 1 1 1 1 1 1 1 1
9909999999999909908907808795506505909909099908899996505595995696707797
9909999998999909904403409597403509906900096202890994906398999599108393
100 ****************** * * * * * ***** *** * * ** * * ** * * *
90 ****************** ** ** * * * *********** ***** * * ** * * *
80 ********************* ***** * * ***************** * * ** * ** *
70 *************************** * * ***************** * * ** *********
60 **************************** ********************#***** **************
50 #***************#####*******************#*##*#*###********************
40 ###*************##########********#***############**************######
30 #####***###################******##################*******#**#########
20 ######################################################################
10 ######################################################################
0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
0 5 0 5 0 5 0 5 0 5 0 5 0
CPU% per hour (last 72 hours)
* = maximum CPU% # = average CPU%

Is this normal or abnormal?

How much  percentage of CPU usage  will really affect the switch performance?

CPU utilization for five seconds: 32%/22%; one minute: 41%; five minutes: 46%

Thanks.  The outputs provided are very helpful. 

41% is normal and I wouldn't be concerned about this.  What is the "uptime" of all the switches?

Hi Leo,

          Thankyou.FYI 

SW1# uptime is 51 weeks, 5 days, 9 hours, 58 minutes
System returned to ROM by power-on
System restarted at 19:55:11 UTC Sat May 9 2015
System image file is "flash:c3750-ipservicesk9-mz.122-55.SE3.bin"

Switch 02
---------
Switch Uptime : 51 weeks, 5 days, 10 hours, 2 minutes

And recently we configured static route between these switch & another site switch via fiber ,after some time it worked fine and i have seen that tx load 96/255 rx load 26/255 in this switch,at that time theis switch crossed 100 percentage utilization.

How much Tx & Rx load will be permissible without affecting switch performance?

Thanks.

How much Tx & Rx load will be permissible without affecting switch performance?

Until the counters for Output Drops starts incrementing.  

Don't leave the switches up for more than a year without rebooting.  

Thank you.

even Tx load 220/255,Rx load 208/255 also doesn't matter until there is no output drop or output drop increase in that interface right?

Review Cisco Networking for a $25 gift card