06-20-2023 01:23 AM
Hi guys,
We have a 3 switches stack (Cat. 9200) running IOS XE 17.6.5 with the CPU overloaded with SISF Switcher Th process :
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
433 1401525420 72375221 19364 79.83% 81.16% 82.11% 0 SISF Switcher Th
As per issue CSCvk32439 I implemented a IPv6 filter on trunk ports but the issue remained. Eventually I disabled the DHCP snooping on all VLANs but this can only be a short term workaround.
Could anyone please suggest a long term remediation ?
Regards, Vincent
06-20-2023 01:29 AM
Please post the complete output to the command "sh platform software status con brief".
06-20-2023 01:35 AM
Here it is (Please kindly note, as of now DHCP snooping is disabled, thus CPU is not overloaded) :
chun-hdie-blssrg-dsw1#sh platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 0.72 0.50 0.45
2-RP0 Healthy 0.25 0.24 0.19
3-RP0 Healthy 0.45 0.27 0.21
Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Healthy 4028724 990376 (25%) 3038348 (75%) 1772980 (44%)
2-RP0 Healthy 4028728 951664 (24%) 3077064 (76%) 1753108 (44%)
3-RP0 Healthy 4028728 790316 (20%) 3238412 (80%) 987476 (25%)
CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 7.70 3.59 0.00 87.97 0.61 0.10 0.00
1 5.87 3.81 0.00 89.58 0.51 0.20 0.00
2 4.76 3.10 0.00 91.40 0.51 0.20 0.00
3 4.67 3.63 0.00 91.06 0.51 0.10 0.00
2-RP0 0 2.27 2.89 0.00 94.11 0.61 0.10 0.00
1 2.90 1.65 0.00 94.81 0.51 0.10 0.00
2 2.80 2.90 0.00 93.66 0.51 0.10 0.00
3 2.15 2.35 0.00 94.87 0.51 0.10 0.00
3-RP0 0 2.22 1.92 0.00 95.23 0.50 0.10 0.00
1 1.22 2.24 0.00 96.02 0.40 0.10 0.00
2 2.65 2.55 0.00 94.38 0.30 0.10 0.00
3 2.22 1.92 0.00 95.55 0.30 0.00 0.00
06-20-2023 01:50 AM
Nothing wrong with the control-plane.
Please post the "1st page" of the output "sh proc cpu platform sort location switch act r0".
06-20-2023 01:57 AM
Here is the result of the command :
chun-hdie-blssrg-dsw1#show processes cpu platform sorted location switch active r0
CPU utilization for five seconds: 9%, one minute: 10%, five minutes: 9%
Core 0: CPU utilization for five seconds: 10%, one minute: 10%, five minutes: 10%
Core 1: CPU utilization for five seconds: 8%, one minute: 10%, five minutes: 9%
Core 2: CPU utilization for five seconds: 11%, one minute: 10%, five minutes: 9%
Core 3: CPU utilization for five seconds: 10%, one minute: 9%, five minutes: 9%
Pid PPid 5Sec 1Min 5Min Status Size Name
--------------------------------------------------------------------------------
5786 5771 17% 16% 16% S 99408 fed main event
4610 4356 13% 12% 12% S 220328 linux_iosd-imag
36 2 7% 6% 6% S 0 ksmd
17361 17313 1% 1% 1% S 33204 fman_fp_image
4789 4779 1% 1% 1% S 16408 sif_mgr
29151 29132 0% 0% 0% S 30368 python3
29132 5148 0% 0% 0% S 2572 pman
27761 27754 0% 0% 0% S 14008 cli_agent
27754 3592 0% 0% 0% S 2584 pman
27643 27636 0% 0% 0% S 3892 cmm
27636 3592 0% 0% 0% S 2584 pman
27522 27509 0% 0% 0% S 26288 dbm
27509 3592 0% 0% 0% S 2588 pman
27281 27268 0% 0% 0% S 29928 fman_rp
27268 3592 0% 0% 0% S 2580 pman
26862 26851 0% 0% 0% S 6516 tms
26851 3592 0% 0% 0% S 2580 pman
26579 26570 0% 0% 0% S 29496 smand
26570 3592 0% 0% 0% S 2584 pman
26210 26175 0% 0% 0% S 10392 psd
26175 3592 0% 0% 0% S 2588 pman
25732 9220 0% 0% 0% S 424 sleep
25332 12031 0% 0% 0% S 424 sleep
25287 25275 0% 0% 0% S 664 sntp
25275 1 0% 0% 0% S 1220 stack_sntp.sh
24649 1 0% 0% 0% S 1824 rotee
24478 24474 0% 0% 0% S 2444 iosd_console_at
24474 24320 0% 0% 0% S 1632 bexec.sh
24320 24319 0% 0% 0% S 1524 runin_exec_proc
24319 24090 0% 0% 0% S 2464 in.telnetd
24281 12260 0% 0% 0% S 424 sleep
24090 8550 0% 0% 0% S 1528 runin_exec_proc
23586 1 0% 0% 0% S 1820 rotee
23375 23348 0% 0% 0% S 2444 iosd_console_at
06-20-2023 02:22 AM
Nothing wrong with the Data Plane, either.
Is the SISF Switcher process still high?
06-20-2023 02:25 AM
Hi Leo,
As mentionned, as I disabled the DHCP snooping, there is no more CPU overload. However this is a workaround that cannot remain on the long term.
Regards, Vincent
06-20-2023 04:05 AM
Upgrade to 17.9.3 and enable DHCP snooping. See if it is better.
06-20-2023 04:20 AM
There are alot of trafic fed to cpu?
Check this guide to see what is frame punt to your CPU'
Be careful in debug with start/stop
You already have high cpu with debug it can be issue.
06-20-2023 06:01 AM
Hi guys,
Thanks for suggestion. Since there is already a CPU overload I will 1st try to upgrade to release 17.9.3 then I'll see how to move forward.
Vincent
06-20-2023 11:33 PM
Hi,
Eventually I upgraded to 17.9.3 but the behavior remains the same. I was able to run the command suggested by Leo on another equipment where DHCP snooping is not yet disabled. Here is the result.
chun-hdie-sr54-dsw1#show platform software status control-processor brief
Load Average
Slot Status 1-Min 5-Min 15-Min
1-RP0 Healthy 3.00 2.22 2.00
2-RP0 Healthy 0.26 0.24 0.21
Memory (kB)
Slot Status Total Used (Pct) Free (Pct) Committed (Pct)
1-RP0 Healthy 4014172 943136 (23%) 3071036 (77%) 1780016 (44%)
2-RP0 Healthy 4014172 904236 (23%) 3109936 (77%) 1766228 (44%)
CPU Utilization
Slot CPU User System Nice Idle IRQ SIRQ IOwait
1-RP0 0 28.18 5.67 0.00 64.56 1.26 0.31 0.00
1 33.08 6.05 0.00 59.60 0.93 0.31 0.00
2 39.02 4.57 0.00 55.35 0.83 0.20 0.00
3 39.24 5.01 0.00 54.59 0.83 0.31 0.00
2-RP0 0 1.96 2.58 0.00 94.82 0.51 0.10 0.00
1 2.15 2.87 0.00 94.35 0.51 0.10 0.00
2 2.60 2.60 0.00 94.27 0.41 0.10 0.00
3 2.69 3.21 0.00 93.66 0.41 0.00 0.00
chun-hdie-sr54-dsw1#show processes cpu platform sorted location switch active r0
CPU utilization for five seconds: 42%, one minute: 42%, five minutes: 42%
Core 0: CPU utilization for five seconds: 38%, one minute: 42%, five minutes: 42%
Core 1: CPU utilization for five seconds: 48%, one minute: 42%, five minutes: 43%
Core 2: CPU utilization for five seconds: 29%, one minute: 41%, five minutes: 42%
Core 3: CPU utilization for five seconds: 52%, one minute: 43%, five minutes: 41%
Pid PPid 5Sec 1Min 5Min Status Size Name
--------------------------------------------------------------------------------
3894 3835 106% 105% 104% R 222112 linux_iosd-imag
5172 5147 35% 35% 35% S 97868 fed main event
35 2 7% 7% 7% S 0 ksmd
5505 5470 3% 3% 3% S 8428 btman
728 2 2% 2% 1% S 0 lsmpi-xmit
18420 18412 1% 2% 2% S 15064 repm
16073 16040 1% 1% 1% S 33316 fman_fp_image
7796 1 1% 1% 1% S 5452 chasync.sh
7518 7509 1% 2% 2% S 13848 btman
4195 4180 1% 1% 1% S 16688 sif_mgr
729 2 1% 1% 1% S 0 lsmpi-rx
31911 2 0% 0% 0% I 0 kworker/1:0-pm
31136 2 0% 0% 0% I 0 kworker/u8:3-kverity
31124 8521 0% 0% 0% S 428 sleep
30639 11209 0% 0% 0% S 424 sleep
30146 2 0% 0% 0% I 0 kworker/u8:0-kverity
29438 11480 0% 0% 0% S 424 sleep
29198 2 0% 0% 0% I 0 kworker/0:0H-mmc_com
28930 2 0% 0% 0% I 0 kworker/3:1H
28662 2 0% 0% 0% I 0 kworker/2:0H
28535 2 0% 0% 0% I 0 kworker/3:0-cgroup_d
24251 2 0% 0% 0% S 0 SarIosdMond
23602 23589 0% 0% 0% S 30036 python3
23589 4440 0% 0% 0% S 2772 pman
23362 23334 0% 0% 0% S 15436 cli_agent
23334 3127 0% 0% 0% S 2780 pman
23191 23186 0% 0% 0% S 4004 cmm
23186 3127 0% 0% 0% S 2776 pman
23076 23070 0% 0% 0% S 26724 dbm
23070 3127 0% 0% 0% S 2780 pman
22838 22828 0% 0% 0% S 27880 fman_rp
22828 3127 0% 0% 0% S 2776 pman
22483 22464 0% 0% 0% S 6572 tms
22464 3127 0% 0% 0% S 2780 pman
I'll try debug later on a low activity timeframe.
Regards, Vincent
06-21-2023 12:03 AM
@mitard wrote:
3894 3835 106% 105% 104% R 222112 linux_iosd-imag
Is there an SNMP monitoring going on? And how many?
Is DNAC polling this stack?
06-24-2023 06:41 AM
Cat9k#show platform software fed switch active punt packet-capture brief
as I mention before there is some server or some host do scan and flood packet over all your network,
first share above
second show interface and check input traffic for each interface, check which port have input (unicast, broadcast, multicast) count increase rapidly
lastly
try shut port by port and monitor the CPU %,
that it.
35% fed <<- is too high
06-21-2023 12:06 AM
Yes we have 2 SNMP supervision on-going (we're under a supervision migration process, so legacy and new supervision system polls the device), but we do not have DNA center in our infrastructure.
06-21-2023 12:37 AM
Temporarily stop SNMP and watch if the CPU cycles drop.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide