SG350GX - high CPU load during snmp polls

Stan1212 · ‎02-17-2021

Hello, Cisco community.

I operate 2 stacked SG350GXs for a half of the year.

I find it very disapointing that during each snmp poll from prometheus, cpu load spikes to 90-95%, making switch almost unmanageble.

Web sessions literally stop and i can clearly see the degraded perfomance of ssh console.

So this is a regular cpu usage with no snmp activity enabled (12%, 8%, 12%)

And this happens when i reenable my prometheus job (93%,58%,48%)

Software version is v2.5.5.47 / RTESLA2.5.5_930_364_286.

Is there some ways to tweak my perfomance?

Thanks in advance!

balaji.bandi · ‎02-17-2021

Looks some other users in the community also reported sometime back the same issue around SNMP, Do you really need all the ports to monitor using SNMP, if not tweak only required ports to Monitor and check if that resolves the issue. make only required SNMP polling adjustments.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Stan1212 · ‎02-17-2021

Yes, i'll indeed try to make some ajustments to my poller, but my another question: do cpu load affect my network perfomance as well?

Thanks!

balaji.bandi · ‎02-17-2021

YES, it will do, if there is no CPU headroom, how the device can process any other requests, rather go crash?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Stan1212 · ‎02-17-2021

Well i sort of hoped that this particular CPU completely incapable of performing the simpliest of snmp operations is not used for switching purposes.

I've just ajusted my snmp-exporter to collect only interface counters, no other data, and it gave no positive changes. Scrape is still 20-28s long and CPU is spiking as well.

Stan1212 · ‎02-17-2021

well i find it confusing and rediculous.

# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds 32.984104957
# HELP snmp_scrape_pdus_returned PDUs returned from walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned 9361
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds 32.968419379

Stan1212 · ‎02-24-2021

I feel sorry for upping my post. But i would really like to see more suggestions.

balaji.bandi · ‎02-24-2021

what is the outcome if you disable SNMP ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Stan1212 · ‎02-24-2021

well, as we expected: average cpu utilization falls to 5-10%

balaji.bandi · ‎02-24-2021

Looks for me bug here, open a TAC case with SMB teram, they may offer some solution or they add this as bug work with you for new release.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Martin Aleksandrov · ‎02-24-2021

@Stan1212

We do not have such a bug logged in our database. Please contact the Cisco STAC centre and raise a support ticket. Contact details are as follows:

https://www.cisco.com/c/en/us/support/web/tsd-cisco-small-business-support-center-contacts.html

Regards,

Martin

oassupport · ‎08-01-2023

Case: 695938654

Our is related to CBS350 stacks with large VLAN (~160) count and SNMP.
Stacks with high VLAN count and SNMP disabled are back at under 10% CPU
Stacks with low VLAN count (~16) and SNMP enabled are under 10% CPU

StanislavAndreevitch · ‎08-21-2023

Hello, cisco commuinity. Since i was unable to use my account i used to send the first message of this thread, i had to register another.

After oassupport's reply, i checked my stack again just to discover it's 100% cpu utilization. Well, i tried disabling snmp, but to no avail. That stack still serves as aggregation-level switch and it's configs are rarely changed. I can even add that there were no significant changes in configs since 02-17-2021.

So the last saturday we applied the latest firmware (2.5.9.16) and the problem seems to be gone. I even ajusted snmp poller to poll stack every 15sec, instead of 60 secs.

core-stack#sh cpu ut         
CPU utilization service is on.

CPU utilization
---------------
five seconds: 11%; one minute: 32%; five minutes: 31%