EEM Applet - Delay sending of email alert on route removal

Jim McCormick · ‎06-06-2015

Hello,

I've written two applets (see below). The first is triggered when a BGP route is removed from the table. The action is to send an email to the helpdesk. the second applet sends a second email when a route has been added to the BGP table (the assumption is that the circuit which was down and caused the removal of the route is now back up). The client routes fall within the 172.29.0.0/16 space, with each client getting a /24 subnet wthin that space.

I would like to introduce a delay of 2 minutes before sending the email that a route was lost. After 2 minutes, if the route was not added back into the table, then send the email stating the route was removed.

If within the first 2 minutes of the route being removed from the BGP table it is added in, no email should be sent (just a log message to the router stating that the route was added).

Anyone have an idea on how to accomplish this?

event manager applet LOST_CLIENT_ROUTE
event routing protocol bgp type remove network 172.29.0.0/16 le 24
action 1.0 syslog msg "Client network $_routing_network was removed from BGP table"
action 2.0 info type routername
action 3.0 mail server "$_email_server" to "$_email_to" from "$_info_routername" source-interface lo0 subject "EEM ALERT - Lost Client Network $_routing_network on Verizon Headend" body "Please contact the Network on-call."
exit

event manager applet ADDED_CLIENT_ROUTE
event routing protocol bgp type add network 172.29.0.0/16 le 24
action 1.0 syslog msg "Client network $_routing_network was added to BGP table"
action 2.0 info type routername
action 3.0 mail server "$_email_server" to "$_email_to" from "$_info_routername" source-interface lo0 subject "EEM ALERT - Added Client Network $_routing_network on Verizon Headend"
exit

Joe Clarke · ‎06-12-2015

Have a look at the attached lab task I created. It makes use of a concept called nesting where one policy creates another. Your use case is exactly tailored for this type of thing.

Jim McCormick · ‎06-12-2015

Thank you for responding Joe and thank you for providing the lab. As you can see from my above applets, I report on any 172.29.0.0/16 route removed from BGP. The clients in the MPLS cloud each comprise a 172.29.x.0/24. Some clients have multiple 172.29.x.0/24 subnets advertised into the cloud. Below is the log result that my applet generates when such a client advertising two subnets goes down:

Jun 12 05:30:55 EDT: %HA_EM-6-LOG: LOST_CLIENT_ROUTE: Client network 172.29.98.0 was removed from BGP table
Jun 12 05:30:55 EDT: %HA_EM-6-LOG: LOST_CLIENT_ROUTE: Client network 172.29.198.0 was removed from BGP table

Since deploying the applets, we've received multiple false alerts because the subnets lost are received again within one minute.
Now I will add a nested applet with a 2 minute countdown timer as you suggest via the attached lab. This is where I believe I run into the issue. After two minutes, I need to check any 172.29.x.0 routes that were reported as lost. I demonstrated above where two routes were lost because of single client going down, but I have also had instances where due to a carrier update, I may have lost 5 to 10 different 172.29.x.0 routes only to have them added back in within a minute.

Your lab demonstrates matching the condition of a single interface. In my scenario, if 5 routes were lost, how do I match each 5 individually, after 2 minutes check to see if each is in the BGP table and if not, then take an action (email the helpdesk)?

PS. If you remember to answer, have you ever chaired an EEM session at CiscoLive? I attended one in San Diego (possibly Orlando) but don't remember who the moderator was.

Jim McCormick · ‎06-13-2015

After posting the above Joe, I started going down testing down this route:

event manager applet LOST_CLIENT_ROUTE
 event routing network 172.29.0.0/16 type remove protocol BGP le 24 maxrun 120
 action 1.0 syslog msg "Client network $_routing_network was removed from BGP table"
 action 1.1 wait 120
 action 1.2 cli command "enable"
 action 1.3 cli command "show ip bgp $_routing_network"

A debug shows that it executes successfully and that the response from "sh p bgp..." is "Network not in table"

Jun 13 15:34:27.502: %HA_EM-6-LOG: LOST_CLIENT_ROUTE : DEBUG(cli_lib) : : OUT : % Network not in table

My thought now is some type of IF/ELSE process. If a match on "Network not in table", then action email, else no action. If within the two minutes the route was added back into the BGP table, then cli output of "sh ip bgp 172.29.x.0" would list the network and no action would be taken because we did not match "Network not in table". I am educating myself as I go, so I still need to investigate the IF/ELSE possibility.
As I continue down this path, my same concern exists...what happens if there are multiple 172.29.x.0 routes that were lost? Does EEM somehow record each 172.29.x.0 in multiple $_routing_network variables, or will it only record the last 172.29.x.0? I am hoping that the loss of each route is somehow a separate running EEM process, i.e, 5 lost routes = 5 EEM processes, and that within each process when the cli command sh ip bgp $_routing_network is issued, each process will put the specific 172.29.x.0 route into the sh ip bpg x.x.x.x command.
I will continue working down this path but ask for your thoughts.

Jim

Jim McCormick · ‎06-14-2015

Another update Joe.

I have updated the Lost Client Route applet per below. I added a 120 second wait and a maxrun of 150 seconds. I then added a match on "Network not in table". Lastly, I added an IF match then email, ELSE nothing. Testing was successful both on the loss of a single 172.29.x.0/24 route along with the loss of multiple 172.29.x.0/24 subnets.

The last hurdle is the second applet, preventing the emailing of an alert if a route was added to the BGP table. Currently, the second applet does not know whether the route that was added was lost within the last two minutes. My desire is the following logic:
Identify route added to BGP table
IF that route was removed from BGP table within the last two minutes, do nothing, ELSE email

Of course the logic could be reversed to IF that route was removed from the BGP table at least two minutes ago, email, ELSE nothing. I orignally thought of a nested event but the maxrun of 150 seconds gets in the way. If a circuit failure takes 8 hours to repair, the nested Added route alert would not fire off due to the 150 second expiration of the parent event. My next thought was somehow matcihng on the log event that I register in parent event. Match the route in that log and if the time of that log message was greater than 2 minutes from the current time, then email.
Your input is appreciated.

event manager applet LOST_CLIENT_ROUTE
event routing protocol bgp type remove network 172.29.0.0/16 le 24 maxrun 150
 action 1.0 syslog msg "Client network $_routing_network was removed from BGP table"
 action 1.1 wait 120
 action 1.2 cli command "enable"
 action 1.3 cli command "sh ip bgp $_routing_network"
 action 1.4 set result "none"
 action 1.5 regexp "Network not in table" $_cli_result result
 action 1.6 if $_regexp_result eq 1
 action 1.7 info type routername
 action 1.8 mail server ...
action 2.0 end

Joe Clarke · ‎06-14-2015

I never recommend the long waits. This is why I suggested nesting. It's much cleaner to do it that way as you're not tying up device resources for the period of the wait. I still think nesting will work here.

I had thought you'd do (in pseudo-code):

event routing protocol bgp type remove network 172.29.0.0/16 le 24

cli command "config t"

cli command "event manager applet route-timer"

cli command "event timer countdown time 120"

cli command "action 1.0 mail server ..."

event routing protocol bgp type add network 172.29.0.0/16 le 24

cli command "config t"

cli command "no event manager applet route-timer"

In this manner, if the timer is allowed to reach 0, then you get the email. Else, if the route is added back within two minutes, the timer is killed, and the email is not sent.

And, yes, from about 2005 until last year, I was delivering EEM talks at CiscoLive!. This year I handed it off to another speaker.