06-06-2015 07:03 AM
Hello,
I've written two applets (see below). The first is triggered when a BGP route is removed from the table. The action is to send an email to the helpdesk. the second applet sends a second email when a route has been added to the BGP table (the assumption is that the circuit which was down and caused the removal of the route is now back up). The client routes fall within the 172.29.0.0/16 space, with each client getting a /24 subnet wthin that space.
I would like to introduce a delay of 2 minutes before sending the email that a route was lost. After 2 minutes, if the route was not added back into the table, then send the email stating the route was removed.
If within the first 2 minutes of the route being removed from the BGP table it is added in, no email should be sent (just a log message to the router stating that the route was added).
Anyone have an idea on how to accomplish this?
06-12-2015 05:19 PM
06-12-2015 09:26 PM
Thank you for responding Joe and thank you for providing the lab. As you can see from my above applets, I report on any 172.29.0.0/16 route removed from BGP. The clients in the MPLS cloud each comprise a 172.29.x.0/24. Some clients have multiple 172.29.x.0/24 subnets advertised into the cloud. Below is the log result that my applet generates when such a client advertising two subnets goes down:
Jun 12 05:30:55 EDT: %HA_EM-6-LOG: LOST_CLIENT_ROUTE: Client network 172.29.98.0 was removed from BGP table Jun 12 05:30:55 EDT: %HA_EM-6-LOG: LOST_CLIENT_ROUTE: Client network 172.29.198.0 was removed from BGP table
Since deploying the applets, we've received multiple false alerts because the subnets lost are received again within one minute.
Now I will add a nested applet with a 2 minute countdown timer as you suggest via the attached lab. This is where I believe I run into the issue. After two minutes, I need to check any 172.29.x.0 routes that were reported as lost. I demonstrated above where two routes were lost because of single client going down, but I have also had instances where due to a carrier update, I may have lost 5 to 10 different 172.29.x.0 routes only to have them added back in within a minute.
Your lab demonstrates matching the condition of a single interface. In my scenario, if 5 routes were lost, how do I match each 5 individually, after 2 minutes check to see if each is in the BGP table and if not, then take an action (email the helpdesk)?
PS. If you remember to answer, have you ever chaired an EEM session at CiscoLive? I attended one in San Diego (possibly Orlando) but don't remember who the moderator was.
06-13-2015 01:00 PM
After posting the above Joe, I started going down testing down this route:
event manager applet LOST_CLIENT_ROUTE event routing network 172.29.0.0/16 type remove protocol BGP le 24 maxrun 120 action 1.0 syslog msg "Client network $_routing_network was removed from BGP table" action 1.1 wait 120 action 1.2 cli command "enable" action 1.3 cli command "show ip bgp $_routing_network"
A debug shows that it executes successfully and that the response from "sh p bgp..." is "Network not in table"
Jun 13 15:34:27.502: %HA_EM-6-LOG: LOST_CLIENT_ROUTE : DEBUG(cli_lib) : : OUT : % Network not in table
My thought now is some type of IF/ELSE process. If a match on "Network not in table", then action email, else no action. If within the two minutes the route was added back into the BGP table, then cli output of "sh ip bgp 172.29.x.0" would list the network and no action would be taken because we did not match "Network not in table". I am educating myself as I go, so I still need to investigate the IF/ELSE possibility.
As I continue down this path, my same concern exists...what happens if there are multiple 172.29.x.0 routes that were lost? Does EEM somehow record each 172.29.x.0 in multiple $_routing_network variables, or will it only record the last 172.29.x.0? I am hoping that the loss of each route is somehow a separate running EEM process, i.e, 5 lost routes = 5 EEM processes, and that within each process when the cli command sh ip bgp $_routing_network is issued, each process will put the specific 172.29.x.0 route into the sh ip bpg x.x.x.x command.
I will continue working down this path but ask for your thoughts.
Jim
06-14-2015 08:59 AM
Another update Joe.
I have updated the Lost Client Route applet per below. I added a 120 second wait and a maxrun of 150 seconds. I then added a match on "Network not in table". Lastly, I added an IF match then email, ELSE nothing. Testing was successful both on the loss of a single 172.29.x.0/24 route along with the loss of multiple 172.29.x.0/24 subnets.
The last hurdle is the second applet, preventing the emailing of an alert if a route was added to the BGP table. Currently, the second applet does not know whether the route that was added was lost within the last two minutes. My desire is the following logic:
Identify route added to BGP table
IF that route was removed from BGP table within the last two minutes, do nothing, ELSE email
Of course the logic could be reversed to IF that route was removed from the BGP table at least two minutes ago, email, ELSE nothing. I orignally thought of a nested event but the maxrun of 150 seconds gets in the way. If a circuit failure takes 8 hours to repair, the nested Added route alert would not fire off due to the 150 second expiration of the parent event. My next thought was somehow matcihng on the log event that I register in parent event. Match the route in that log and if the time of that log message was greater than 2 minutes from the current time, then email.
Your input is appreciated.
event manager applet LOST_CLIENT_ROUTE event routing protocol bgp type remove network 172.29.0.0/16 le 24 maxrun 150 action 1.0 syslog msg "Client network $_routing_network was removed from BGP table" action 1.1 wait 120 action 1.2 cli command "enable" action 1.3 cli command "sh ip bgp $_routing_network" action 1.4 set result "none" action 1.5 regexp "Network not in table" $_cli_result result action 1.6 if $_regexp_result eq 1 action 1.7 info type routername action 1.8 mail server ... action 2.0 end
06-14-2015 10:55 AM
I never recommend the long waits. This is why I suggested nesting. It's much cleaner to do it that way as you're not tying up device resources for the period of the wait. I still think nesting will work here.
I had thought you'd do (in pseudo-code):
event routing protocol bgp type remove network 172.29.0.0/16 le 24
cli command "config t"
cli command "event manager applet route-timer"
cli command "event timer countdown time 120"
cli command "action 1.0 mail server ..."
event routing protocol bgp type add network 172.29.0.0/16 le 24
cli command "config t"
cli command "no event manager applet route-timer"
In this manner, if the timer is allowed to reach 0, then you get the email. Else, if the route is added back within two minutes, the timer is killed, and the email is not sent.
And, yes, from about 2005 until last year, I was delivering EEM talks at CiscoLive!. This year I handed it off to another speaker.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide