cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
11977
Views
15
Helpful
17
Replies

High CPU Utilization on Catalyst 3850

melsayeh
Level 1
Level 1

Hi all,

 

I have a stack of 4 Catalyst 3850-24X working as a distribution switch, lying in between a Nexus 7K core switch and 34 C3850 access switches/stacks. I was struggling with high CPU utilization problems happening on the distribution switch. I upgraded the firmware from 16.03.06 to 16.12.5b without a noticeable change in CPU levels.

 

Mainly, the processes that eat the CPU are SISF Switcher Th, Spanning Tree, and Crimson flush tr. Sometimes, MATM RP Shim Pro and VMATM Callback spark enormously causing the switch to hit 100% and eventually leading to a network outage for a considerable amount of time.

I reviewed the STP configuration on the entire network to make sure there isn't a misconfig somewhere.

 

Here is show version output:

Cisco IOS XE Software, Version 16.12.05b
Cisco IOS Software [Gibraltar], Catalyst L3 Switch Software (CAT3K_CAA-UNIVERSALK9-M), Version 16.12.5b, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2021 by Cisco Systems, Inc.
Compiled Thu 25-Mar-21 13:09 by mcpre


Cisco IOS-XE software, Copyright (c) 2005-2021 by cisco Systems, Inc.
All rights reserved.  Certain components of Cisco IOS-XE software are
licensed under the GNU General Public License ("GPL") Version 2.0.  The
software code licensed under GPL Version 2.0 is free software that comes
with ABSOLUTELY NO WARRANTY.  You can redistribute and/or modify such
GPL code under the terms of GPL Version 2.0.  For more details, see the
documentation or "License Notice" file accompanying the IOS-XE software,
or the applicable URL provided on the flyer accompanying the IOS-XE
software.


ROM: IOS-XE ROMMON
BOOTLDR: CAT3K_CAA Boot Loader (CAT3K_CAA-HBOOT-M) Version 4.78, RELEASE SOFTWARE (P)

Switch uptime is 15 hours, 12 minutes
Uptime for this control processor is 15 hours, 15 minutes
System returned to ROM by Reload Command at 19:44:44 UTC Sun Nov 28 2021
System restarted at 19:50:28 UTC Sun Nov 28 2021
System image file is "flash:cat3k_caa-universalk9.16.12.05b.SPA.bin"
Last reload reason: Reload Command



This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.


Technology Package License Information:

------------------------------------------------------------------------------
Technology-package                                     Technology-package
Current                        Type                       Next reboot
------------------------------------------------------------------------------
ipservicesk9            Smart License                    ipservicesk9
None                    Subscription Smart License       None


Smart Licensing Status: UNREGISTERED/EVAL MODE

cisco WS-C3850-24XS (MIPS) processor (revision J0) with 794888K/6147K bytes of memory.
Processor board ID FCW2025F017
4 Virtual Ethernet interfaces
128 Ten Gigabit Ethernet interfaces
8 Forty Gigabit Ethernet interfaces
2048K bytes of non-volatile configuration memory.
4194304K bytes of physical memory.
255037K bytes of Crash Files at crashinfo:.
255037K bytes of Crash Files at crashinfo-2:.
255037K bytes of Crash Files at crashinfo-3:.
255037K bytes of Crash Files at crashinfo-4:.
3417161K bytes of Flash at flash:.
3417161K bytes of Flash at flash-2:.
3417161K bytes of Flash at flash-3:.
3417161K bytes of Flash at flash-4:.
0K bytes of WebUI ODM Files at webui:.

Base Ethernet MAC Address          : 00:56:2b:d9:18:00
Motherboard Assembly Number        : 73-16649-06
Motherboard Serial Number          : FOC20237ZEH
Model Revision Number              : J0
Motherboard Revision Number        : A0
Model Number                       : WS-C3850-24XS
System Serial Number               : FCW2025F017


Switch Ports Model              SW Version        SW Image              Mode
------ ----- -----              ----------        ----------            ----
*    1 34    WS-C3850-24XS      16.12.05b         CAT3K_CAA-UNIVERSALK9 BUNDLE
     2 34    WS-C3850-24XS      16.12.05b         CAT3K_CAA-UNIVERSALK9 BUNDLE
     3 34    WS-C3850-24XS      16.12.05b         CAT3K_CAA-UNIVERSALK9 BUNDLE
     4 34    WS-C3850-24XS      16.12.05b         CAT3K_CAA-UNIVERSALK9 BUNDLE


Switch 02
---------
Switch uptime                      : 15 hours, 15 minutes

Base Ethernet MAC Address          : 00:56:2b:fb:b3:80
Motherboard Assembly Number        : 73-16649-06
Motherboard Serial Number          : FOC20237ZF2
Model Revision Number              : J0
Motherboard Revision Number        : A0
Model Number                       : WS-C3850-24XS
System Serial Number               : FCW2025C0KA
Last reload reason                 : Reload Command

Switch 03
---------
Switch uptime                      : 15 hours, 15 minutes

Base Ethernet MAC Address          : 00:56:2b:d9:71:80
Motherboard Assembly Number        : 73-16649-06
Motherboard Serial Number          : FOC20237ZG0
Model Revision Number              : J0
Motherboard Revision Number        : A0
Model Number                       : WS-C3850-24XS
System Serial Number               : FCW2025C09R
Last reload reason                 : Reload Command

Switch 04
---------
Switch uptime                      : 15 hours, 15 minutes

Base Ethernet MAC Address          : 00:56:2b:d8:cf:00
Motherboard Assembly Number        : 73-16649-06
Motherboard Serial Number          : FOC20237ZNA
Model Revision Number              : J0
Motherboard Revision Number        : A0
Model Number                       : WS-C3850-24XS
System Serial Number               : FOC2024X19X
Last reload reason                 : Reload Command

Configuration register is 0x102

A snapshot of CPU utilization:

CPU utilization for five seconds: 94%/18%; one minute: 94%; five minutes: 90%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
 355    22207257    31706826        700 25.91% 22.36% 21.51%   0 SISF Switcher Th
 100     6240314      300554      20762 20.47%  6.02%  6.38%   0 Crimson flush tr
 250     9225912    11647469        792  9.67% 10.67% 12.53%   0 Spanning Tree
 356     5010254     7654994        654  9.11%  5.43%  5.58%   0 SISF Main Thread
  52     3076582    10132512        303  8.39%  3.35%  2.93%   0 ARP Snoop
 126     3088462    29867557        103  3.03%  3.72%  3.61%   0 IOSXE-RP Punt Se
 324      798535    10202444         78  1.43%  2.52%  2.16%   0 DAI Packet Proce
 174      335234      539281        621  0.95%  4.32%  2.99%   0 MATM RP Shim Pro
  80      525196     1199488        437  0.87%  2.43%  2.09%   0 IOSD ipc task
 222      199145      223304        891  0.23%  0.23%  0.23%   0 CDP Protocol
 539      136525      385569        354  0.15%  0.16%  0.16%   0 LLDP Protocol
 305      287384     1185684        242  0.15%  3.31%  1.86%   0 IGMPSN
 398       60295     1230894         48  0.15%  0.06%  0.06%   0 MMA DB TIMER
  98       97747      555629        175  0.15%  0.58%  0.46%   0 cpf_process_tpQ
 431       60906     1230847         49  0.15%  0.05%  0.06%   0 MMA DP TIMER
 432       58013     2445117         23  0.15%  0.04%  0.05%   0 MMON MENG
  15       41144      402680        102  0.07%  0.02%  0.03%   0 DB Lock Manager
 538       30671      397301         77  0.07%  0.04%  0.02%   0 ONEP Network Ele
 149       15627       32888        475  0.07%  0.02%  0.00%   0 SFF8472
 204       60996     1230802         49  0.07%  0.04%  0.05%   0 VRRS Main thread

Any help would be highly appreciated.

1 Accepted Solution

Accepted Solutions

melsayeh
Level 1
Level 1

Thanks to everyone who tried to help.

 

I found a post here that solved the high CPU utilization problem: https://community.cisco.com/t5/cisco-bug-discussions/cscvk32439-ipv6-sisf-main-thread-consumes-high-cpu-dhcpv6-icmpv6/td-p/3778970

 

The root cause behind the issue was the dhcp snooping. Although I disabled it globally using the command "no ip dhcp snooping", it didn't really help until I used the command "no ip dhcp snooping vlan 1-4094". The CPU utilization then dropped significantly from 85%+ to 25%. Hope it will help anyone who has a similar problem.

      333222221111133333222222222233333111111111133333111111111133
      333555559999977777444441111144444999999999955555777779999933
  100
   90
   80
   70
   60
   50
   40              *****                         *****
   30 ********     *****          *****          *****
   20 **********************************************************
   10 **********************************************************
     0....5....1....1....2....2....3....3....4....4....5....5....6
               0    5    0    5    0    5    0    5    0    5    0
               CPU% per second (last 60 seconds)




      333333333433333333333333333433333453333433333333343333443444
      785867778366768434022445888087899016989099878989819887117018
  100
   90
   80
   70
   60
   50                                   *
   40 ***************        ***********************************
   30 *#*#****###***##**##****##*#******#****#***###*####***#***
   20 ##########################################################
   10 ##########################################################
     0....5....1....1....2....2....3....3....4....4....5....5....6
               0    5    0    5    0    5    0    5    0    5    0
               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%



         1 11 1   1            1             1                        1
      599090090999099999999999909999999999999099999999999999999999999909999999
      499090090999099999999999906662233585347096569489547933123225676705869987
  100  ****************************    ***  ******* *** **       ***********
   90  *********************************************************************
   80  ************************###########################################*#
   70  *###*******************##############################################
   60  #####################################################################
   50 *#####################################################################
   40 *#####################################################################
   30 ######################################################################
   20 ######################################################################
   10 ######################################################################
     0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
               0    5    0    5    0    5    0    5    0    5    0    5    0
                   CPU% per hour (last 72 hours)
                  * = maximum CPU%   # = average CPU%

 

View solution in original post

17 Replies 17

Hello, 

 

which switch(es) in your network is/are the root switch(es) for your Vlan(s) ?

 

There are numerous bugs with regard to SISF and Crimson flush, however, you are running the recommended version which supposedly includes fixes for these bugs.

 

That said, check if the below (bug) might apply:

 

Crash due to "Crimson flush transactions Process"
CSCvt76409

 

Symptom:
Crash due to Crimson flush transactions Process.

Conditions:
Seeing sisf mac update record error due to Not enough space.

Workaround:
- enable service internal
- device-tracking tdl-disable

Further Problem Description:
This happens only when device-tracking is enabled, which may be explicit (cli) or implicit (started by some other feature like lisp, ip dhcp snooping, dot1x, etc.)

Hello George,

 

Thanks for your response.

 

The root bridge for all the VLANs is the core switch (Nexus 7k).

We are using dot1x and dhcp snooping on the access layer but on the distribution. If I applied the suggested workaround on the distribution switch, would it impact dot1x or dhcp snooping on the access layer?

 

Hello,

 

I have no idea if that will impact the distribution switch to be honest. You might want to test this after hours...

Leo Laohoo
Hall of Fame
Hall of Fame

@melsayeh wrote:
WS-C3850-24XS      16.12.05b         CAT3K_CAA-UNIVERSALK9 BUNDLE

Stack is on Bundle Mode.  Read Cisco 3850: IOS-XE/Firmware Upgrade.

Post the complete output to the command "sh platform software status control-processor brief".

Hi Leo,

 

Can bundle mode cause CPU issues?

Here is the output of "sh platform software status control-processor brief":

Load Average
 Slot  Status  1-Min  5-Min 15-Min
1-RP0 Healthy   2.54   2.62   2.52
2-RP0 Healthy   0.24   0.25   0.31
3-RP0 Healthy   0.09   0.28   0.30
4-RP0 Healthy   0.16   0.20   0.19

Memory (kB)
 Slot  Status    Total     Used (Pct)     Free (Pct) Committed (Pct)
1-RP0 Healthy  3965144  2625524 (66%)  1339620 (34%)   3569596 (90%)
2-RP0 Healthy  3965144  2488204 (63%)  1476940 (37%)   3495812 (88%)
3-RP0 Healthy  3965144  1857480 (47%)  2107664 (53%)   2378336 (60%)
4-RP0 Healthy  3965144  1854232 (47%)  2110912 (53%)   2374248 (60%)

CPU Utilization
 Slot  CPU   User System   Nice   Idle    IRQ   SIRQ IOwait
1-RP0    0  14.40   6.20   0.00  79.30   0.00   0.10   0.00
         1  17.71   5.10   0.00  77.17   0.00   0.00   0.00
         2  28.22  10.91   0.00  60.76   0.00   0.10   0.00
         3  15.26   5.68   0.00  78.94   0.00   0.09   0.00
         4  11.10   3.10   0.00  85.80   0.00   0.00   0.00
         5  56.60   5.70   0.00  37.70   0.00   0.00   0.00
2-RP0    0   3.40   0.70   0.00  95.90   0.00   0.00   0.00
         1   3.30   1.20   0.00  95.49   0.00   0.00   0.00
         2   1.70   0.00   0.00  98.29   0.00   0.00   0.00
         3   3.50   1.60   0.00  94.90   0.00   0.00   0.00
         4   2.99   0.59   0.00  96.40   0.00   0.00   0.00
         5   3.89   1.39   0.00  94.70   0.00   0.00   0.00
3-RP0    0   3.00   1.60   0.00  95.39   0.00   0.00   0.00
         1   1.90   0.10   0.00  97.99   0.00   0.00   0.00
         2   2.90   1.80   0.00  95.29   0.00   0.00   0.00
         3   1.20   0.30   0.00  98.49   0.00   0.00   0.00
         4   4.80   2.80   0.00  92.40   0.00   0.00   0.00
         5   1.60   1.50   0.00  96.89   0.00   0.00   0.00
4-RP0    0   1.99   0.29   0.00  97.70   0.00   0.00   0.00
         1   0.09   0.09   0.00  99.80   0.00   0.00   0.00
         2   0.49   0.39   0.00  99.10   0.00   0.00   0.00
         3   1.80   0.50   0.00  97.60   0.00   0.10   0.00
         4   4.60   1.90   0.00  93.50   0.00   0.00   0.00
         5   2.59   0.69   0.00  96.70   0.00   0.00   0.00

 


@melsayeh wrote:
2-RP0 Healthy  3965144  2488204 (63%)  1476940 (37%)   3495812 (88%)

That is high.  Ideally, non-master switch members should be operating 45% (or less) memory.  Anything higher than 50% is not good. 

Post the complete output to the command "sh proc memory sort location switch 2 r0".  Just post the output from the first page only.


@melsayeh wrote:

Can bundle mode cause CPU issues?


No, it is not, however, convert to Install Mode because the stack may need to be rebooted in the next 4 weeks.  If not, the memory leak will cause switch 2 to crash.

Hi Leo,

 

The command "sh proc memory sort location switch 2 r0" was not recognized.

 

However, here is the output of "sh proc memory sorted":

Processor Pool Total:  813826352 Used:  312854864 Free:  500971488
reserve P Pool Total:     102404 Used:         88 Free:     102316
 lsmpi_io Pool Total:    6295128 Used:    6294296 Free:        832

 PID TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process
   0   0  290592664   57994880  204275608          0          0 *Init*
   4   0   25572152    1224672   22540744          0          0 RF Slave Main Th
  80   0  452286528   73508304   11363328      13668          0 IOSD ipc task
 355   0 12438335696  311122432    8836784          0   67346784 SISF Switcher Th
   0   0  487840880  482370424    5379712   17618099     809532 *Dead*
 305   0  144177520  128407752    5104416    6040608    4342812 IGMPSN
 469   0    4228096     196880    4088216     849828          0 EEM ED Syslog
 541   0    5484504    2866992    2590368          0      16884 LACP Protocol
 356   0 1325106448 13733605088    2516280    3516688          0 SISF Main Thread
  10   0  693515560  325412072    2377368  279462916  221759564 Pool Manager
   0   0          0          0    1904896          0          0 *MallocLite*
 486   0    1851192     180736    1700048       9448          0 EEM Server
 273   0    1657752     324968    1349800          0          0 XDR receive
 423   0    2214888    1115288    1101696          0          0 Crypto CA
   1   0   10921616    9885832    1094672          0          0 Chunk Manager
 332   0    1018728     124784     961872          0          0 CEF: IPv4 proces
 230   0     810920          0     879920          0          0 IP ARP Adjacency
  73   0    2492232     274144     712200       7236          0 Net Background
 174   0 1578403512 1565096304     702280    3728220          0 MATM RP Shim Pro
 413   0     462720        896     514824          0          0 EST Client
 303   0    3556936    5870688     482536      20100          0 IGMPSN L2MCM
  44   0     975840     496568     471992          0          0 Entity MIB API
 234   0     844320     360952     458888          0          0 mDNS
 317   0    1999000    3883088     444424          0          0 MLDSN L2MCM
 470   0     388144       5680     439464      72316          0 EEM ED Generic
  31   0    9144104      84736     431912          0          0 IPC Seat RX Cont
 142   0     408384     108664     428560          0          0 SAMsgThread
 435   0     398096        728     376856      17808          0 Crypto IKEv2
 100   0 9894701176 9894354624     325920          0          0 Crimson flush tr
 277   0     246344      34072     279336          0          0 CEF background p
 539   0  944201384  254691544     279080       5220      45024 LLDP Protocol
 274   0     232512        896     276616          0          0 IPC LC Message H
 260   0     153728          0     270856          0          0 st_pw_oam
 359   0       1824          0     262824          0          0 COPS
 323   0 3753840624 3687928664     245680   13511176          0 VMATM Callback
 396   0     192560          0     237560          0          0 mDNS snooping
 525   0     167984        448     236536          0          0 MRIB Process
 536   0     220608       1424     230936          0          0 ONEP Network Ele
 478   0    2399384    4054624     217080          0          0 PM Callback
 101   0      98736      86544     215736          0          0 DBAL EVENTS
 495   0     262928     107696     209432          0          0 Call Home proces
 222   0  680775792  276761976     207352          0      10452 CDP Protocol
 351   0     140416          0     185416          0          0 L2FIB Event Disp
 529   3    1958536    1842512     182920          0          0 SSH Process
  89   0     283544       1328     179536          0          0 REDUNDANCY FSM
 538   0     226432     106016     173120          0          0 RADIUS
 102   0     752688     253256     170904      33308          0 EM_SHIM_TASK
 228   0      98736          0     167736          0          0 IPAM Manager
 554   0      50600       1992     167600          0          0 LICENSE AGENT
 311   0      38096        448     154648          0          0 AN
 232   0      83656     217272     151408          0      27836 IP Input
 398   0     102792          0     147792          0          0 MMA DB TIMER
 288   2    1956288    1877200     146912          0          0 SSH Process
  37   0   34648208   34685848     145744          0      12936 ARP Input
 178   0     100608          0     145608          0          0 radius radsec cl
 206   0      26864          0     143864          0          0 PKI_SSL LSC Enro
 --More--

Hi Leo,

 

was the information above helpful? 

I have provided the wrong command:  sh proc memory platform sorted location switch 2 r0

Hi Leo,

 

Thanks for this, here is the first page of the command output: "sh proc memory platform sorted location switch 2 r0"

System memory: 3965144K total, 2477816K used, 1487328K free,
Lowest: 1480120K
   Pid    Text      Data   Stack   Dynamic       RSS              Name
----------------------------------------------------------------------
  7671  204098    600072     136       360    600072   linux_iosd-imag
 12588     247    336552     128     96192    336552    fed main event
  8550    1088    124924     132      2396    124924      platform_mgr
  8112     418    122384     128      2488    122384           sif_mgr
 26381    8794     78544     132      2372     78544           fman_rp
 20889     179     77432     128      5520     77432          sessmgrd
 20800    9875     77052     132     23612     77052     fman_fp_image
 25682     911     65980     132     10560     65980             smand
 26699     227     58468     132      2344     58468               dbm
 28469     108     42012     132      1460     42012              pubd
 21342     637     32256     128      2604     32256              repm
 24427       8     25904     136      7380     25904         python2.7
 27464     100     20308     128      1908     20308               psd
 27069      76     18664     128        76     18664         cli_agent
  7911     534     17112     128      1768     17112         stack_mgr
  9938     600     14308     476      2524     14308              hman
 10280      99     14068     128      1364     14068         bt_logger
  9041     202     13084     128      1824     13084              lman
 10518     248     12760     128      1652     12760             btman
 25981     306     12124     132      3204     12124               tms
 12835     248     11096     128      1480     11096             btman
  9244     147     10836     128      1284     10836            keyman
 13706      89     10340     128      2208     10340         tams_proc
  8794    1096     10008     404      7808     10008            ncd.sh
 10912    1096      9804     400      7624      9804   auto_upgrade_cl
  9458    1096      9760     404      7612      9760     issu_stack.sh
  4189     545      9600     132       132      9600          libvirtd
  6751    1096      9096     400      5276      9096   rollback_timer.
 15804    1096      8804     404      7480      8804     issu_stack.sh
 15796    1096      8748     404      7480      8748     issu_stack.sh
 21631     112      8704     132      1660      8704             plogd
 26909     170      8200     128      1124      8200               cmm
  8313     345      8048     132       268      8048           nif_mgr
 15229     120      7908     132      1104      7908    epc_ws_liaison
 11478    1096      7404     408      5264      7404       periodic.sh
  7010    1096      7256     404      3624      7256           psvp.sh
  4195    1096      6884     400      3072      6884       droputil.sh
  4485    1096      6832     400      3072      6832      reflector.sh
 13607     130      6812     136       920      6812         tamd_proc
     1    1450      6512     132      1616      6512           systemd
 13471      76      6080     136       672      6080   tam_svcs_ng3k_c
  7074    1096      5668     400      3452      5668            pvp.sh
  4534    1096      5492     396      1680      5492          iptbl.sh
 28575      82      5436     136       416      5436             pttcd
 11172    1096      5388     396      3204      5388            pvp.sh
 20484    1096      5056     400      2968      5056   brelay_console.
   467     252      4752     132       132      4752       dbus-daemon
 21750    1096      4508     404      2412      4508        btelnet.sh
 21024    1096      4500     400      2400      4500         brelay.sh
  4170     687      4412     132       132      4412          virtlogd
  4458      10      4240     132       268      4240             rotee
  6922      10      4216     132       268      4216             rotee
  4643      10      4216     132       268      4216             rotee
  4677      10      4156     132       268      4156             rotee
  4344      10      4132     132       268      4132             rotee

Long-story-short, avoid using 16.12.X.  PERIOD.

What version would you recommend?

Latest 16.6.X or 16.9.X.

I am here due to a lagging switch seemingly due to high CPU/spanning tree - we use dhcp snooping and it seems to be effecting out Extreme(aerohive) APs. It is currently on IOS 16.6.7. All the other 3650s are on 3.06.06E. Im thinking of loading that build on this one to see if it resolves the problems.

Review Cisco Networking for a $25 gift card