Showing results for 
Search instead for 
Did you mean: 
Join Customer Connection to register!

Static Route IP SLA Tracking with Delayed RIB Installation

Hi all


Have the following task. We need to track branch's local ISP and if it's unrachable, then failover to regional ISP which is available via MPLS/BGP. Task is quite simple, if not one of its requirements which I am not sure how to meet. It says - if local ISP fails and is down for 5 minutes, failover to regional ISP, but do not failback SOONER than 15 minutes to avoid flapping.


So, generally speaking, failover / failback is not so hard to achieve with IP SLA and tracking, but tracking feature support delayed timers of 0 to 180seconds, which is way below of what I need.


Has anyone faced the same task and how was it resolved?


Here's a basic config

ip sla 99
icmp-echo source-interface Loopback180
tag Local ISP Tracking
frequency 30
ip sla schedule 99 start-time now life forever
track 99 ip sla 99 reachability
delay up 180 down 180
ip route <local-ISP-gateway> permanent
ip route <local-ISP-gateway> track 99

Some will say that using is not perfect and it's better to use IP address from ISP1's range, but ignore it. At this moment in time we haven't decided what to ping in outside world to track the liveliness of the service. Defo not ISP's gateway, but something more external. In any case, static permanent route to ensures it will ALWAYS go via ISP1. is not an important resource from a corporate user perspective, so that will do the trick.


The line in red bold is my biggest concern at the moment. 3 minutes failover and failback is not what my requirements say.


This is Cat3850 running 3.7.4E



I have copied IP sla and tracking configuration from your post, but not using any pre-configured default routes and just have a static route for pointing towards the primary link. I am using eem script to inject and withdraw default routes based on tracking status (up or down).


If track is up then eem script will install the primary default routes and if down then it will remove the primary default route and install back up default route. Failover to back up and failback to primary link is set to 5 minutes (300 seconds). I tested this configuration in GNS3 and worked fine, if you try debugging event manager in gns3, it may crash the IOS.


Be patient with the failover and failback because 5 minutes may feel like 10 minutes when you are actually waiting to see if it works.


I hope it helps you. Below is the configuration for eem:


event manager environment q "
event manager applet track-down
 event track 99 state down
 action 001 cli command "enable"
 action 002 cli command "config t"
 action 003 cli command "event manager applet track-timer"
 action 004 cli command "event timer countdown time 360"
 action 005 cli command "action 1.0 cli command enable"
 action 006 cli command "action 2.0 cli command $q config t$q"
 action 007 cli command "action 3.0 cli command $q no ip route$q"
 action 008 cli command "action 4.0 cli command $q ip route$q"
 action 009 cli command "action 5.0 cli command $q no event manager applet track-timer$q"
 action 010 cli command "action 6.0 cli command end"



event manager environment s "
event manager applet track-up
 event track 99 state up
 action 001 cli command "enable"
 action 002 cli command "config t"
 action 003 cli command "event manager applet track-timer"
 action 004 cli command "event timer countdown time 360"
 action 005 cli command "action 1.0 cli command enable"
 action 006 cli command "action 2.0 cli command $s config t$s"
 action 007 cli command "action 3.0 cli command $s no ip route$s"
 action 008 cli command "action 4.0 cli command $s ip route$s"
 action 009 cli command "action 5.0 cli command $s no event manager applet track-timer$s"
 action 010 cli command "action 6.0 cli command end"



Thanks for this. Just what I needed (in fact, I was thinking about EEM as an alternative solution, but I am not great with it). I will try it in a lab and then mark it as an acceptable solution if it does it the way I need :)


Appreciate your time

Your very welcome. I hope it works out for you.



First of all, thanks for those two applets, helped me to look in the right direction. I did few tests in my LIVE lab (i.e. real kit) and found the following - Cat9k or Cat3850s do not support EOT events (enhanced object tracking), so your applets cannot be used in my environment. I didn't give up though and after a number of different versions I ended up with the following applet which works GREAT and does exactly what I need


! Permanent static route to forces IP SLA object to always
! my main ISP no matter what - this destination will be changed in live environment
ip route Po1 permanent
! This is default/preferrable gateway into Internet, but it requires HA
ip route Po1
ip sla 99
 source-interface Loopback180
 frequency 30
ip sla schedule 99 life forever start-time now
ip sla reaction-configuration 99 react timeout threshold-type consecutive 10 action-type triggerOnly
ip sla enable reaction-alerts
! I use this TRACK object as my global BOOLEAN variable (see below) 
track 99 stub-object
event manager environment qt "
event manager applet track-route authorization bypass
 event tag 1.0 ipsla operation-id 99 reaction-type timeout

 action 001 cli command "enable"
 action 002 cli command "config t"

 ! This block is executed when IP SLA triggers timeout event
 action 010 if $_ipsla_condition eq "Occurred"
  action 011 track read 99
  action 012 if $_track_state eq down
   action 013 cli command "no ip route Po1"
  action 015 else
   action 016 cli command "no event manager applet track-route-up"
  action 017 end
  action 018 track set 99 state down

 ! This block is executed when IP SLA triggers Up/Recovery event
 action 020 else
  action 021 track set 99 state up
  action 022 cli command "event manager applet track-route-up authorization bypass"
  action 023 cli command " event timer countdown time 60"
  action 024 cli command " action 1.0 track set 99 state down"
  action 025 cli command " action 1.1 cli command $qt enable $qt"
  action 026 cli command " action 1.2 cli command $qt config t $qt"
  action 028 cli command " action 1.3 cli command $qt ip route Po1 $qt"
  action 029 cli command " action 1.4 cli command $qt no event manager applet track-route-up $qt"
 action 040 end

So, here's the explanation how it works


The moment becomes unreachable, IP SLA begins to collect statistics for its trigger, which is configured to trigger alert when 10 consecutive echos return timeout code. For my lab I have configured 30s frequency for ICMP echoes, so theoretically this can trigger alarm within 300s (5m).


This can be of course changed to 'xOfy' to detect flapping conditions. Anyway, in 300s if link is down as per IP SLA, then applet is being executed. It checks the value of my global BOOLEAN variable and for this I use a STUB trcking object, which is created with default DOWN value (I use it as false). I treat this value as "IF it's TRUE then link UP applet has been dynamically created in CLI and is awaiting for execution when timer expires"... so when IP SLA is triggered and this variable is FALSE then it means I am safe to instantly DELETE the static route, which is done by the applet and then it stops


Once IP SLA detects first successful ECHO it triggers another alert and my master applet is executed, but this time it runs through a second block (top level else condition) which simply sets my global BOOLEAN variable to TRUE (telling my future instances of the applet that I am in the proccess of recovery) and creates a child applet which is executed with 600s delay (10m).


If nothing changes from IP SLA perspective within 600s, then child applet is executed to reinstall my static route, it also resets BOOLEAN to false again because I am not in recovery mode anymore.


If, however, link is flapping... i.e. IP SLA triggers another timeout alarm/event... then master applet is executed, first IF condition is matched ('Occured') but here an interesting thing happens. I check if my TRACK object (or my BOOLEAN) is true. If it is, it means I am already in recovery mode... but link flaps, so I have to stop this recovery mode by deleting my child applet :)


I've done few cycles of tests and it works perfectly fine, but I might be able to find few more things before it goes live. I am quite satisfied with myself :D


Thanks again!

Good to know that you came up with an applet that works for your environment. Thanks for sharing the script and explaining how the script works.