High amount of SNMP traps !

Offshore Network · ‎12-01-2016

Friends ,

We have snmp traps enabling on our networking devices, roughly 250 ( Routers+switches).

On all Wan (internet facing) routers , we have more or less the below traps enabled :

#sh run | i traps

snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart

snmp-server enable traps ospf state-change

snmp-server enable traps ospf errors

snmp-server enable traps ospf retransmit

snmp-server enable traps ospf lsa

snmp-server enable traps ospf cisco-specific state-change nssa-trans-change

snmp-server enable traps ospf cisco-specific state-change shamlink interface

snmp-server enable traps ospf cisco-specific state-change shamlink neighbor

snmp-server enable traps ospf cisco-specific errors

snmp-server enable traps ospf cisco-specific retransmit

snmp-server enable traps ospf cisco-specific lsa

snmp-server enable traps aaa_server

snmp-server enable traps bgp

snmp-server enable traps config

snmp-server enable traps frame-relay multilink bundle-mismatch

snmp-server enable traps frame-relay

snmp-server enable traps frame-relay subif

#

Logging to 10.207.68.152.162, 4/10, 495603 sent, 462 dropped.

Logging to 10.36.134.109.162, 4/10, 495480 sent, 585 dropped.

Logging to 10.36.134.109.162, 4/10, 495603 sent, 462 dropped.

Logging to 10.36.160.131.162, 4/10, 495480 sent, 585 dropped.

#

.109 is our collector host and .131 is the syslog server.

the thing is that we have a new system in place for our monitoring perspective and the .109 is a new sever setup (OpenNms) ,which is application based . We have been raised concerns from their team to see if we can tune our snmp traps sent across .

I recorded the snmp traps sent count for a period of 24 hrs. its been seen that on an avg within a time period of 24 hrs, the total traps sent was almost 80 k ! (seems lot right ?)

To analyze the nature of the traps sent across , i enabled snmp debug and attached log is for your reference. I enabled debug for about 1 min and could see that most of the traps sent are BGP related ( if am not missing other).

The current trap enabled for BGP is :

snmp-server enable traps bgp

So i tried change the trap settings and enabld BGP state changes only as below :

snmp-server enable traps bgp state-changes

Then,i allowed the traps to be sent for say 24 hrs. But the count didn't decrease , and still it was almost 80-90k !

BGP peers 'Alive' on this Router :

=================================================================

#sh ip bgp vpnv4 vrf Backbone summary | ex Admin|Idle|Active

BGP activity 803908/175101 prefixes, 2321370/1082581 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

10.206.96.100 4 65480 24538 24540 6651009 0 0 1w0d 1

10.206.96.101 4 65440 12391 12386 6651009 0 0 1w0d 3

10.206.96.102 4 65430 24527 24553 6651009 0 0 1w0d 1

10.206.96.103 4 65435 12381 12392 6651009 0 0 1w0d 3

10.206.96.107 4 65011 24544 24553 6651009 0 0 1w0d 1

10.206.96.108 4 65460 9997 10001 6651009 0 0 6d07h 1

10.206.96.109 4 65465 24532 24570 6651009 0 0 1w0d 1

10.206.96.110 4 65455 12390 12403 6651009 0 0 1w0d 1

10.206.96.112 4 64929 12397 12410 6651009 0 0 1w0d 9

10.206.96.113 4 64939 4149 4150 6651009 0 0 2d14h 1

10.206.96.134 4 64909 12390 12408 6651009 0 0 1w0d 2

10.206.96.136 4 65130 12372 12394 6651009 0 0 1w0d 2

10.206.96.137 4 65120 24537 24548 6651009 0 0 1w0d 36

10.206.96.138 4 65140 12382 12407 6651009 0 0 1w0d 2

10.206.96.147 4 65360 24531 24538 6651009 0 0 1w0d 5

10.206.96.148 4 65495 8282 8284 6651009 0 0 5d05h 2

10.206.96.150 4 65421 12382 12393 6651009 0 0 1w0d 1

10.206.96.155 4 65101 12507 12397 6651009 0 0 1w0d 1

10.206.96.161 4 65201 12386 12399 6651009 0 0 1w0d 1

10.206.96.162 4 65423 24527 24539 6651009 0 0 1w0d 1

10.206.96.164 4 65202 12381 12394 6651009 0 0 1w0d 2

#sh ip bgp vpnv4 vrf Internet summary | ex Admin|Idle|Active

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

10.96.97.162 4 22883 12391 12382 6651047 0 0 1w0d 6

10.96.97.164 4 22883 12387 12379 6651047 0 0 1w0d 8

10.206.97.91 4 22883 1296955 2179161 6651047 0 0 1w0d 610952

63.246.214.249 4 7029 2126503 24537 6651047 0 0 1w0d 627583

#

===========================================================================

I understand that this router is our Core WAN router at the Data centre and has lot of BGP peerings. But does it justify the amount of traps that are been sent?

intially i had the config as below :

snmp-server host 10.36.134.109 vrf Backbone public

snmp-server host 10.36.134.109 vrf Internet public

Then i removed the vrf part and now its as below :

snmp-server host 10.36.134.109 public

This has reduced the trap sen count a lot, but it has increased the drops count to a lot. But is it right? i mean we have the bgp peerings in both Backbone and internet vrfs, so the vrf keywords should be present in the snmp-server host line right?

any suggestions wud be appreciated lot friends

Regards

Karthi

alcidio.tembe1 · ‎02-10-2017

Hi there

So I have a problem on my router, the CPU is being high around 98% of usage, when I type show processes cpu sorted, the result is:

CPU utilization for five seconds: 99%/34%; one minute: 97%; five minutes: 73%
PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
344   460720672   531663231        866 27.35% 30.31% 23.13%   0 SNMP ENGINE
121   141441724   693951610        203 26.23% 24.05% 13.82%   0 IP Input
342    99210108   986973292        100 8.79% 9.52% 7.08%   0 IP SNMP
343    14466816   531298217         27 0.87% 1.05% 0.87%   0 PDU DISPATCHER
59          96           6      16000 0.47% 0.03% 0.00% 644 SSH Process
337      162632     6374283         25 0.15% 0.08% 0.08%   0 FNF Cache Ager P
85      565420     6383879         88 0.15% 0.14% 0.15%   0 Per-Second Jobs
104      537828    25481393         21 0.15% 0.15% 0.15%   0 Netclock Backgro
240         700        1001        699 0.15% 0.02% 0.02% 646 SSH Process
128      182344   786481501          0 0.07% 0.12% 0.10%   0 Ethernet Msec Ti
187      263204 1526775889          0 0.07% 0.10% 0.19%   0 HQF Output Shape
316      229700     6230328         36 0.07% 0.04% 0.05%   0 CFT Timer Proces
49        2440     6372246          0 0.07% 0.00% 0.00%   0 GraphIt
125       90304    98487972          0 0.07% 0.03% 0.02%   0 VRRS Main thread
228     3898560    32333917        120 0.07% 0.04% 0.04%   0 ADJ resolve proc

Could you please help me to solve this issue

Best Regards

ART