cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2023
Views
0
Helpful
12
Replies

SG350GX - high CPU load during snmp polls

Stan1212
Level 1
Level 1

Hello, Cisco community. 

I operate 2 stacked SG350GXs for a half of the year. 

I find it very disapointing that during each snmp poll from prometheus, cpu load spikes to 90-95%, making switch almost unmanageble. 

Web sessions literally stop and i can clearly see the degraded perfomance of ssh console. 

So this is a regular cpu usage with no snmp activity enabled (12%, 8%, 12%) photo_2021-02-17_14-28-10.jpg

And this happens when i reenable my prometheus job (93%,58%,48%) photo_2021-02-17_14-30-00.jpg

 

Software version is v2.5.5.47 / RTESLA2.5.5_930_364_286.

Is there some ways to tweak my perfomance?

Thanks in advance!

 

12 Replies 12

balaji.bandi
Hall of Fame
Hall of Fame

Looks some other users in the community also reported sometime back the same issue around SNMP, Do you really need all the ports to monitor using SNMP, if not tweak only required ports to Monitor and check if that resolves the issue. make only required SNMP polling adjustments.

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Yes, i'll indeed try to make some ajustments to my poller, but my another question: do cpu load affect my network perfomance as well?

Thanks!

 

YES, it will do, if there is no CPU headroom, how the device can process any other requests, rather go crash?

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Well i sort of hoped that this particular CPU completely incapable of performing the simpliest of snmp operations is not used for switching purposes. 

I've just ajusted my snmp-exporter to collect only interface counters, no other data, and it gave no positive changes. Scrape is still 20-28s long and CPU is spiking as well.

 

 

well i find it confusing and rediculous.

# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing). # TYPE snmp_scrape_duration_seconds gauge snmp_scrape_duration_seconds 32.984104957 # HELP snmp_scrape_pdus_returned PDUs returned from walk. # TYPE snmp_scrape_pdus_returned gauge snmp_scrape_pdus_returned 9361 # HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took. # TYPE snmp_scrape_walk_duration_seconds gauge snmp_scrape_walk_duration_seconds 32.968419379

Stan1212
Level 1
Level 1

I feel sorry for upping my post. But i would really like to see more suggestions.

 

what is the outcome if you disable SNMP ?

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

Stan1212
Level 1
Level 1

well, as we expected: average cpu utilization falls to 5-10%

Looks for me bug here, open a TAC case with SMB teram, they may offer some solution or they add this as bug work with you for new release.

 

BB

***** Rate All Helpful Responses *****

How to Ask The Cisco Community for Help

@Stan1212 

 

We do not have such a bug logged in our database. Please contact the Cisco STAC centre and raise a support ticket. Contact details are as follows:

 

https://www.cisco.com/c/en/us/support/web/tsd-cisco-small-business-support-center-contacts.html

 

Regards,

Martin

 

 

Case: 695938654

Our is related to CBS350 stacks with large VLAN (~160) count and SNMP.
Stacks with high VLAN count and SNMP disabled are back at under 10% CPU
Stacks with low VLAN count (~16) and SNMP enabled are under 10% CPU

Hello, cisco commuinity. Since i was unable to use my account i used to send the first message of this thread, i had to register another.

After oassupport's reply, i checked my stack again just to discover it's 100% cpu utilization. Well, i tried disabling snmp, but to no avail. That stack still serves as aggregation-level switch and it's configs are rarely changed. I can even add that there were no significant changes in configs since 02-17-2021.

So the last saturday we applied the latest firmware (2.5.9.16) and the problem seems to be gone. I even ajusted snmp poller to poll stack every 15sec, instead of 60 secs. 

 

core-stack#sh cpu ut         
CPU utilization service is on.

CPU utilization
---------------
five seconds: 11%; one minute: 32%; five minutes: 31%