cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
854
Views
0
Helpful
1
Replies

High amount of SNMP traps !

Friends ,
We have snmp traps enabling  on our networking devices, roughly 250 ( Routers+switches).
On all Wan (internet facing) routers , we have more or less the below traps enabled :
 
#sh run | i traps
snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart
snmp-server enable traps ospf state-change
snmp-server enable traps ospf errors
snmp-server enable traps ospf retransmit
snmp-server enable traps ospf lsa
snmp-server enable traps ospf cisco-specific state-change nssa-trans-change
snmp-server enable traps ospf cisco-specific state-change shamlink interface
snmp-server enable traps ospf cisco-specific state-change shamlink neighbor
snmp-server enable traps ospf cisco-specific errors
snmp-server enable traps ospf cisco-specific retransmit
snmp-server enable traps ospf cisco-specific lsa
snmp-server enable traps aaa_server
snmp-server enable traps bgp 
snmp-server enable traps config
snmp-server enable traps frame-relay multilink bundle-mismatch
snmp-server enable traps frame-relay
snmp-server enable traps frame-relay subif
#

 
 
    Logging to 10.207.68.152.162, 4/10, 495603 sent, 462 dropped.
    Logging to 10.36.134.109.162, 4/10, 495480 sent, 585 dropped.
    Logging to 10.36.134.109.162, 4/10, 495603 sent, 462 dropped.
    Logging to 10.36.160.131.162, 4/10, 495480 sent, 585 dropped.
#

.109 is our collector host and .131 is the syslog server.

 
the thing is that we have a new system in place for our monitoring perspective and the .109 is a new sever setup (OpenNms) ,which is application based . We have been raised concerns from their team to see if we can tune our snmp traps sent across .
I recorded the snmp traps sent count  for a period of 24 hrs. its been seen that on an avg within a time period of  24 hrs, the total traps sent was almost 80 k !  (seems lot right  ?)
 
To analyze the nature of the traps sent across , i enabled snmp debug and attached log  is for your reference. I enabled debug for about 1 min and could see that most of the traps sent are BGP related ( if am not missing other).
The current trap enabled for BGP is :
snmp-server enable traps bgp
 
So i tried change the trap settings and enabld BGP state changes only as below :
snmp-server enable traps bgp state-changes
 
Then,i allowed the traps to be sent for say 24 hrs. But the count didn't decrease , and still it was almost 80-90k !
BGP peers 'Alive' on this Router :
=================================================================
#sh ip bgp vpnv4 vrf Backbone summary  | ex Admin|Idle|Active
BGP activity 803908/175101 prefixes, 2321370/1082581 paths, scan interval 60 secs
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.206.96.100   4        65480   24538   24540  6651009    0    0 1w0d            1
10.206.96.101   4        65440   12391   12386  6651009    0    0 1w0d            3
10.206.96.102   4        65430   24527   24553  6651009    0    0 1w0d            1
10.206.96.103   4        65435   12381   12392  6651009    0    0 1w0d            3
10.206.96.107   4        65011   24544   24553  6651009    0    0 1w0d            1
10.206.96.108   4        65460    9997   10001  6651009    0    0 6d07h           1
10.206.96.109   4        65465   24532   24570  6651009    0    0 1w0d            1
10.206.96.110   4        65455   12390   12403  6651009    0    0 1w0d            1
10.206.96.112   4        64929   12397   12410  6651009    0    0 1w0d            9
10.206.96.113   4        64939    4149    4150  6651009    0    0 2d14h           1
10.206.96.134   4        64909   12390   12408  6651009    0    0 1w0d            2
10.206.96.136   4        65130   12372   12394  6651009    0    0 1w0d            2
10.206.96.137   4        65120   24537   24548  6651009    0    0 1w0d           36
10.206.96.138   4        65140   12382   12407  6651009    0    0 1w0d            2
10.206.96.147   4        65360   24531   24538  6651009    0    0 1w0d            5
10.206.96.148   4        65495    8282    8284  6651009    0    0 5d05h           2
10.206.96.150   4        65421   12382   12393  6651009    0    0 1w0d            1
10.206.96.155   4        65101   12507   12397  6651009    0    0 1w0d            1
10.206.96.161   4        65201   12386   12399  6651009    0    0 1w0d            1
10.206.96.162   4        65423   24527   24539  6651009    0    0 1w0d            1
10.206.96.164   4        65202   12381   12394  6651009    0    0 1w0d            2
#sh ip bgp vpnv4 vrf Internet summary  | ex Admin|Idle|Active
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.96.97.162    4        22883   12391   12382  6651047    0    0 1w0d            6
10.96.97.164    4        22883   12387   12379  6651047    0    0 1w0d            8
10.206.97.91    4        22883 1296955 2179161  6651047    0    0 1w0d       610952
63.246.214.249  4         7029 2126503   24537  6651047    0    0 1w0d       627583
#
===========================================================================
 
I understand that this router is our Core WAN router at the Data centre and has lot of BGP peerings. But does it justify the amount of traps that are been sent?
intially i had the config as below :
snmp-server host 10.36.134.109 vrf Backbone  public
snmp-server host 10.36.134.109  vrf Internet public
Then i removed the vrf part and now its as below :
snmp-server host 10.36.134.109 public
This has reduced the trap sen count a lot, but it has increased the drops count to a lot.  But is it right? i mean we have the bgp peerings in both Backbone and internet vrfs, so the vrf keywords should be present in the snmp-server host line right?
any suggestions wud be appreciated lot friends
Regards
Karthi

1 Reply 1

alcidio.tembe1
Level 1
Level 1

Hi there

So I have a problem on my router, the CPU is being high around 98% of usage, when I type show processes  cpu sorted, the result is:

CPU utilization for five seconds: 99%/34%; one minute: 97%; five minutes: 73%
 PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
 344   460720672   531663231        866 27.35% 30.31% 23.13%   0 SNMP ENGINE
 121   141441724   693951610        203 26.23% 24.05% 13.82%   0 IP Input
 342    99210108   986973292        100  8.79%  9.52%  7.08%   0 IP SNMP
 343    14466816   531298217         27  0.87%  1.05%  0.87%   0 PDU DISPATCHER
  59          96           6      16000  0.47%  0.03%  0.00% 644 SSH Process
 337      162632     6374283         25  0.15%  0.08%  0.08%   0 FNF Cache Ager P
  85      565420     6383879         88  0.15%  0.14%  0.15%   0 Per-Second Jobs
 104      537828    25481393         21  0.15%  0.15%  0.15%   0 Netclock Backgro
 240         700        1001        699  0.15%  0.02%  0.02% 646 SSH Process
 128      182344   786481501          0  0.07%  0.12%  0.10%   0 Ethernet Msec Ti
 187      263204  1526775889          0  0.07%  0.10%  0.19%   0 HQF Output Shape
 316      229700     6230328         36  0.07%  0.04%  0.05%   0 CFT Timer Proces
  49        2440     6372246          0  0.07%  0.00%  0.00%   0 GraphIt
 125       90304    98487972          0  0.07%  0.03%  0.02%   0 VRRS Main thread
 228     3898560    32333917        120  0.07%  0.04%  0.04%   0 ADJ resolve proc

Could you please help me to solve this issue

Best Regards

ART

Review Cisco Networking for a $25 gift card