SNMP-Engine causing High CPU Utilization

Adrian Caba Gutierrez · ‎05-09-2013

Hello,

I have many routers Cisco 2901 and one router 2951 with IOS version 15, and some days ago these routers are having high CPU every 10 minutes caused by the process SNMP-ENGINE, I have three snmp servers in my network. How I can fix this issue.

Thanks for the help.

Rolf Fischer · ‎05-10-2013

Hi Adrian,

with COPP (control plane policing) you could rate-limit the amount of SNMP traffic, but this will most likely increase the response-times to the NMS.

Apart from that maybe you could optimize the polling behavior of your SNMP servers.

Maybe this documents are helpful to identify the cause: https://supportforums.cisco.com/docs/DOC-15956 http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a00800948e6.shtml

Hope that helps

Rolf

Vinod Arya · ‎05-10-2013

Most of the time these observed spikes are often genuine. As you have three NMS servers, you may want to check if they are colecting some data from devices which takes too much time to process to take a toll on CPU.

You may want to manually notice the incoming packets for snmp by :

show snmp | inc input

When CPU spikes collecting this would help to determine how many packets are hitting device.

If the command show snmp stats oid works you can share the output as well. And if all seems well, when the cpu is high share following details apart from above:

> Show proc cpu sort | exc 0.00

> show stacks (*You can collect SNMP ENGINE PID from show proc cpu command which has first column as PID.)

Example:

NMS-WL-6500#sh proc cpu | i SNMP

223 1575116 6003718 262 0.00% 0.51% 0.62% 0 SNMP ENGINE

With this you can run :

NMS-WL-6500#show stacks 223

Process 223: SNMP ENGINE

Stack segment 0x52056884 - 0x52059764

FP: 0x52059688, RA: 0x416C9FE0

FP: 0x520596B8, RA: 0x414D1234

With this we can check which specific oid is being stuck or taking cpu to max.

-Thanks

-Thanks Vinod **Rating Encourages contributors, and its really free. **

Adrian Caba Gutierrez · ‎05-10-2013

Hi,

I attach the outputs of commands when the CPU goes to 99%.

Thanks a lot for the help.

Vinod Arya · ‎05-11-2013

Thanks for details. it is very strange that the device is polling doesn't look very explosive from the show snmp stats oid.

May be some snmp process is stuck and hang and hence not constant on high CPU. I would need show version from device to check further on this.

Also, If something got hang in snmp process, it may be temporary, I would like you to try to restart the SNMP ENGINE Process.

It has usually no impact on device, except for a few seconds SNMP ENGINE STOPS and do not respond to any snmp packets unless restarted.

This is very simple procedure and doesnt even affects the snmp config. Following is the way to do it:

To Stop

# no snmp-server

To Start

# any snmp command which previously exist can be entered again to push the SNMP ENGINE to start.

Example:

Please do this and see if the SNMP ENGINE subsides after this.

-Thanks

-Thanks Vinod **Rating Encourages contributors, and its really free. **

Adrian Caba Gutierrez · ‎05-13-2013

Hello,

I restarted the SNMP-ENGINE process in the way thath you say but the problem persists. I attach the output of show version command.

Thanks a lot for the help.

Vinod Arya · ‎05-14-2013

Hi Adrain,

It seems large Route and/or ARP Tables Polled by the NMS Station. The Network Management station queries routers for their entire route table to learn about

other networks. It uses this information to find other routers and query them about their
knowledge of networks around them. In this fashion, the management station can learn the
topology of the entire network.

The router stores the route table in a hashed format, more conducive to quick route
searches. However, SNMP responses for the route are required to be returned in
lexicographical order per RFC1213. Therefore, for each SNMP request the router receives,
the hash table must be sorted lexicographically before a SNMP response PDU can be built.
The larger the route table, the more CPU intesive the sort.

SNMP is a low priority process as far as the CPU scheduler is concerned, so another
process requiring CPU resources takes priority. Therefore, while CPU spikes occur in this
scenario, they should not affect performance.

It's possible for you to apply the procedure of blocking the routing and ARP tables
described in the following link and monitor the device?

http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a00800948e6.shtml#l
arge_route

Please check once.

-Thanks Vinod **Rating Encourages contributors, and its really free. **

Adrian Caba Gutierrez · ‎05-14-2013

Hello Vinod,

I already configured the snmp views like the lecture recommend, but the problem persist it is very strange. This is the configuration of snmp in my routers:

snmp-server view cutdown iso included

snmp-server view cutdownat excluded

snmp-server view cutdown internet.6.3.15 excluded

snmp-server view cutdown internet.6.3.16 excluded

snmp-server view cutdown internet.6.3.18 excluded

snmp-server view cutdown ip.21 excluded

snmp-server view cutdown ip.22 excluded

snmp-server community comunity1 view cutdown RO 98

snmp-server community comunity2 view cutdown RO 98

snmp-server community comunityRW view cutdown RW 98

snmp-server ifindex persist

snmp-server location OC

snmp-server contact Administrator

snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart

snmp-server enable traps tty

snmp-server enable traps eigrp

snmp-server enable traps envmon

snmp-server enable traps flash insertion removal

snmp-server enable traps config-copy

snmp-server enable traps config

snmp-server enable traps cpu threshold

snmp-server enable traps syslog

snmp-server enable traps voice

snmp-server host 172.20.1.65 comunity2

Thanks for the help.

Adrian Caba Gutierrez · ‎06-12-2013

Hello,

The problem was solved, the solution was erase the read write snmp community. No one server was reanding this snmp community.

Thanks.

Vinod Arya · ‎06-13-2013

Glad to know the issue is fixed . This sounds strange that the RW string ws creating issue.

-Thanks Vinod **Rating Encourages contributors, and its really free. **