cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
5226
Views
0
Helpful
52
Replies

EEM script to Reload router 1 Time if Dial Fails

Gerard Roy
Level 2
Level 2

We have a script we use to track objects and if access to these objects fails we reload the router 1 time (see it below). We want to now apply the same script in a different capacity for another customer for dial. If the router fails to connect after trying to round robin thru 4 toll free 800 numbers then reload the router.

event manager applet vpn_tunnel_rebooter

event none

action 1.0 cli command "enable"

action 2.0 cli command "config t"

action 3.0 cli command "no event manager applet vpn_tunnel_unreachable"

action 4.0 cli command "end"

action 5.0 cli command "write mem"

action 6.0 reload

!

event manager applet vpn_tunnel_up

event track 456 state up

action 001 cli command "enable"

action 002 cli command "config t"

action 003 cli command "event manager applet vpn_tunnel_unreachable"

action 004 cli command "event track 456 state down"

action 005 cli command "action 1.0 policy vpn_tunnel_rebooter"

action 006 cli command "end"

Any ideas on how we might accomplish this? I have very little EEM experience :)

1) Current Object tracking tracks 3 objects, If access to all three is down we go into a 180 sec delay down timer. If they remain down we switch over to dial backup.

2) We round robin between 4 dial numbers and if access to all 4 fails, we want to reload router "ONE" time

3) When it comes back up and objects are still unavailable we attempt to dial the 4 numbers again and if it also fails to connect "DO NOT" reload this time.

52 Replies 52

Works great - Thanks and please post for all the world to use. This is a Huge issue with 1811's at the very least.

Highest Regards,

Jerry Roy

tHi Joe,

Looks like the script continues to reload the router   I am trying to nail down the exact scenario in which it does this. I current have disconnected the broadband and left the modem line unplugged so I could test the 1 time reboot. I saw it reboot one time and assumed all is well but after a short period I hear it reboot continuosly after each cycle thur the 4 numbers. What can I look at? I notice after the reboot it shows the reachability of the tracked objects start running back down thru the 180 second delay down timer (bb and phone line are disconnected) and then it goes thru the 4 dial backup numbers again and finally reboots again. It continues this over and over. (See atached)

Message was edited by: Gerard Roy - added current config from router

Try this version of the Tcl policy.

Actually, this new version, while it has a bug fix, will not help what you are seeing.  What you are seeing is not a problem with the EEM code.  Your tracked objects are flapping, and EEM is doing what it's told.  When the router reboots, tracked object 456 comes up (01:07:38 UTC).  That installs the Tcl policy which watches for the tracked object to go down.  Then, at 01:10:38 UTC, the tracked object 456 goes down.  This causes the Tcl script to watch the dial state, then trigger another reload.  The process repeats at 01:13:38 UTC.  I think this is being caused by your delay down.

nope It continues to reload. What can I send you that might help debug?

I see it now. For some reason it says the objects are up even when nothin is plugged into the damn port. Just lame.

Let me try a newer version of code to see if this is still an issue.

Joe,

What version of code did you develope on? I have upgraded it and now the problem seems to be gone Can you modify the tcl script to clear the line before it cycles thru the 4 dial backup numbers? It seems to me it would make more sense to clear at this time.

I developed with the assumption that EEM 2.1 was being used, but as I said, the problem is most likely related to your "delay down".  A tracked object can only have one of two states (either up or down).  When the router reloads, the object is most likely up pending the delay down timer of 180 seconds (which explains why the EEM policies were firing on three minute boundaries).

I didn't see any object tracking changes in later 12.4T code, but I could have missed a bug fix.  If the default object state is now up (or you're not getting the Down->Up transition change now), then that's good.

The current Tcl script will clear the line before reloading the device.  It will clear the line, then see if the backup is still down.  If so, then the reload will happen.  If the backup comes up successfully, the policy will remove itself.

I found the bug.  It's CSCsr27735.  The fix for this made it so there is now a "default state" you can specify for tracked objects watching IP SLA operations.  The default state is DOWN (which is what you want in this case).  This change was made in 12.4(22)T.

if both the modem cable and the broadband cable have been unplugged and the router is rebooted via the script, do you think the object tracking should show up right after the reboot? well it does. The delay down has no business showing up after a reboot UNTIL the actually object can be reached. The older version c181x-adventerprisek9-mz.124-15.T7.bin shows it up after the reboot and I as not able to see for how long till it finally figured out it was really down so the script rebooted it again. I have installed c181x-advipservicesk9-mz.124-22.T3.bin and it looks to be resolved. The delay down is required to be sure the link is not flapping. Thanks Again

Yes, this is what I have summarized above (see all my recent posts on this thread).  As I said, the bug which changed the behavior to having IP SLA tracked objects be DOWN by default was merged into 12.4(22)T.  You should be good to go on that version of code.  If you need this to work on older code, a hack would be required in the policy which watches for the SYS-5-RESTART syslog.

Hi Joe,

We have 850 locations I am going to have to update code on if I can't get this to work with existing code How tough to do the hack you mentioned?

Thanks,

Jerry

I believe something like this would work:

event manager environment quote "

event manager applet remove-dial-backup-watch

event syslog pattern "SYS-5-RESTART"

action 1.0 cli command "enable"

action 2.0 cli command "config t"

action 3.0 cli command "no event manager policy tm_check_dial_backup.tcl"

action 4.0 cli command "event manager applet remove-watch-timed"

action 5.0 cli command "event timer countdown time 185"

action 5.1 cli command "action 1.0 cli command $quote enable$quote"

action 5.2 cli command "action 2.0 cli command $quote config t$quote"

action 5.3 cli command "action 3.0 cli command $quote no event manager policy tm_check_dial_backup.tcl$quote"

action 5.4 cli command "action 4.0 cli command $quote no event manager applet remove-watch-timed$quote"

action 5.5 cli command "action 5.0 cli command $quote end$quote"

action 6.0 cli command "end"

Hi Joe,

Still rebooting - see attached. I noticed when I rolled back it also errored on the ip sla statements. I had to re-add them back in as rtr statements. Thank You.

Increase the 185 second timer to 210.  That should provide enough time.

Review Cisco Networking for a $25 gift card