cancel
Showing results for 
Search instead for 
Did you mean: 
cancel

Using Embedded Event Manager (EEM) in IOS-XR for the ASR9000 to simulate ECMP "min-links"

13363
Views
5
Helpful
30
Comments

Introduction

This document provides a sample configuration for using EEM with the IOS-XR releases for the ASR9000. Though EEM is found in other XR platforms also, this example is specifically tested and demonstrated for hte ASR9000.

Core Issue

Using LAG or Link Aggregation or what IOS calls EtherChannel you have the ability to use the "min-links" feature. It allows the system to shut down the whole bundle when there are not enough members available, hence aggregated bandwidth to support the traffic.

There are cases where you cannot use or don't want to use LAG, but rather use ECMP, Equal Cost Multipath, to support more aggregated bandwidth.

In that case, when a certain number of path disappears, there is no way to automatically shut other paths to instantiate failover to a redundant ECMP path.

This document will show how to use EEM to detect an ECMP member going down and automatically shut the other members of that path.

Resolution

a)      Preparation

b)      XR configuration

c)      Setting up interfaces to Monitor

d)     How to determine if the script was invoked

e)      Sample Operation

a)    Preparation:

1)      Setup the interfaces that are to be shut when this script kicks in. Edit the minlinks_ecmp.tcl file with a file editor such as notepad. In the top of the script there is a section that provides these variables. Change the interface names as if you’d configure them yourself.

#Interfaces of the ECMP bundle that need to be shut

set _if_1 "interface g0/0/0/29"

set _if_2 "interface g0/0/0/28"

set _if_3 "interface g0/0/0/27"

set _if_4 "interface g0/0/0/26"

2)      copy the saved and modified tcl file to the disk0:

RP/0/RSP0/CPU0:Viking-Top#copy tftp://3.0.0.1/minlinks_ecmp.tcl disk0:

b)    XR configuration:

(exec cfg)

event manager environment _syslog_pattern .*(GigabitEthernet)(0/0/0/0|0/0/0/20).*(changed state to Down)

event manager directory user policy disk0:

event manager policy minlinks_ecmp.tcl username eem type user

aaa authorization eventmanager default local

(admin cfg)

username eem

group root-system

group cisco-support

!

c)     Setting up the interfaces to monitor

The syslog pattern is currently setup for monitoring Gigabit Ethernet interfaces  0/0/0/0 and 0/0/0/20.

If you’d like to monitor TenG interfaces TenGigE0/3/0/0 and TenGigE0/7/0/1 then change the line to:

event manager environment _syslog_pattern .*(TenGigE)(0/3/0/0|0/7/0/1).*(changed state to Down)

Note that everything is case and space sensitive.

d)    How to determine if the script was invoked

A SYSLOG message will be printed everytime the script is invoked!

RP/0/RSP0/CPU0:Jan 25 09:57:41.248 : minlinks_ecmp.tcl[65814]: ECMP DISTRIBTION SHUTDOWN ON THIS ROUTER!

e)     Sample operation :

We are receiving an event of a interface down that we are monitoring:

LC/0/0/CPU0:Jan 25 10:07:31.742 : ifmgr[171]: %PKT_INFRA-LINK-3-UPDOWN : Interface GigabitEthernet0/0/0/0, changed state to Down

LC/0/0/CPU0:Jan 25 10:07:31.742 : ifmgr[171]: %PKT_INFRA-LINEPROTO-5-UPDOWN : Line protocol on Interface GigabitEthernet0/0/0/0, changed state to Down

RP/0/RSP0/CPU0:Jan 25 10:07:31.758 : ospf[314]: %ROUTING-OSPF-5-ADJCHG : Process CORE, Nbr 172.16.1.1 on GigabitEthernet0/0/0/0 in area 0 from FULL to DOWN, Neighbor Down: interface down or detached,vrf default vrfid 0x60000000

RP/0/RSP0/CPU0:Viking-Top#

!EEM is kicking in to admin shut the pre defined interfaces:

RP/0/RSP0/CPU0:Viking-Top#

LC/0/0/CPU0:Jan 25 10:07:37.931 : ifmgr[171]: %PKT_INFRA-LINK-5-CHANGED : Interface GigabitEthernet0/0/0/26, changed state to Administratively Down

LC/0/0/CPU0:Jan 25 10:07:37.931 : ifmgr[171]: %PKT_INFRA-LINK-5-CHANGED : Interface GigabitEthernet0/0/0/27, changed state to Administratively Down

LC/0/0/CPU0:Jan 25 10:07:37.931 : ifmgr[171]: %PKT_INFRA-LINK-5-CHANGED : Interface GigabitEthernet0/0/0/28, changed state to Administratively Down

LC/0/0/CPU0:Jan 25 10:07:37.932 : ifmgr[171]: %PKT_INFRA-LINK-5-CHANGED : Interface GigabitEthernet0/0/0/29, changed state to Administratively Down

RP/0/RSP0/CPU0:Jan 25 10:07:38.238 : config[65815]: %MGBL-CONFIG-6-DB_COMMIT : Configuration committed by user 'eem'. Use 'show configuration commit changes 1000000656' to view the changes.

** Note the config changes made by the username as per configuration

RP/0/RSP0/CPU0:Jan 25 10:07:38.739 : config[65815]: %MGBL-SYS-5-CONFIG_I : Configured from console by console on vty100 (0.0.0.0)

** Syslog message is printing

RP/0/RSP0/CPU0:Jan 25 10:07:39.007 : minlinks_ecmp.tcl[65812]: ECMP DISTRIBTION SHUTDOWN ON THIS ROUTER!

RP/0/RSP0/CPU0:Viking-Top#

Related Information

XR EEM standard documentation.

Additional note, XR 4.0 and XR4.01 have a bug that will be fixed in 4.1 that prevents proper invokation of TCL upon syslog events. Please consult TAC for ddts and resolution prior to 4.1

Xander Thuijs, CCIE #6775

Sr Tech Lead ASR9000

Comments
Community Member

Xander, many thanks for your quick response!! :)

Version running on production device is 5.3.2. script doesn't work.

I have built a lab with VirtualBox and GNS3 and I'm running 5.3.3 there, and the script works fine.

In regards to the time spent by the script, I can add a 'puts' at the beginning, but we can see that the log line triggering the script starts at 02:52:22

LC/0/0/CPU0:Mar 21 02:52:22 CET: ifmgr[209]: %PKT_INFRA-LINEPROTO-5-UPDOWN : Line protocol on Interface TenGigE0/0/0/5, changed state to Down

and the script gets to the commit step in 6 seconds: 02:52:28

RP/0/RSP0/CPU0:Mar 21 02:52:28 CET: syslog_dev[92]: noscan PID-71455058: To commit

So what I think is that since the commit is not allowed, an error might be thrown by the device, and the scripts never get out from there (I have a 'puts' before and after the 'end' command) , hence after 90 seconds it is closed.

TCL Script here: the main idea is: if one of the sides interface goes down, script should shutdown the other side to produce full switchover to other device.

::cisco::eem::event_register_syslog occurs 1 pattern $_syslog_pattern maxrun 90
#------------------------------------------------------------------
# EEM policy to monitor for a specified syslog message.
#------------------------------------------------------------------
### The following EEM environment variables are used:
###
### Example: _syslog_pattern             %PKT_INFRA-LINEPROTO-5-UPDOWN : Line protocol on Interface TenGigE0/0/0/[5,6]
###
### _config_if1            first interface to monitor
### _config_if2            second interface to monitor

set _config_if1 "TenGigE0/0/0/5"
set _config_if2 "TenGigE0/0/0/6"
set _if1 ""
set _if2 ""

namespace import ::cisco::eem::*
namespace import ::cisco::lib::*
# 1. query the information of latest triggered eem event

array set arr_einfo [event_reqinfo]
if {$_cerrno != 0} {
    set result [format "component=%s; subsys err=%s; posix err=%s;\n%s" \
      $_cerr_sub_num $_cerr_sub_err $_cerr_posix_err $_cerr_str]
    error $result
}
set msg $arr_einfo(msg)

# 2. execute the user-defined config commands
if [catch {cli_open} result] {
    error $result $errorInfo
} else {
    array set cli1 $result    
}
 
 # 3. capture interface status
 after 2000
if { [catch {cli_exec $cli1(fd) "show interface $_config_if1 | include line protocol"} result] } {
    puts "ERROR: Failed to execute 'show interface $_config_if1': '$result'"
    exit 1
}
foreach line [split $result "\n"] {
    if {[regexp "^$_config_if1 is up" $line]} {
        set _if1 "up"
    }
    if {[regexp "^$_config_if1 is down" $line]} {
        set _if1 "down"
    }
    if {[regexp "^$_config_if1 is administratively down" $line]} {
        set _if1 "admin-down"    
    }
}

if { [catch {cli_exec $cli1(fd) "show interface $_config_if2 | include line protocol"} result] } {
    puts "ERROR: Failed to execute 'show interface $_config_if2': '$result'"
    exit 1
}
foreach line [split $result "\n"] {
    if {[regexp "^$_config_if2 is up" $line]} {
        set _if2 "up"
    }
    if {[regexp "^$_config_if2 is down" $line]} {
        set _if2 "down"
    }
    if {[regexp "^$_config_if2 is administratively down" $line]} {
        set _if2 "admin-down"    
    }
}

# Define interface and action to configure
#
if {[regexp "^up" $_if1] && [regexp "^down" $_if2]} {
    set _interface "interface $_config_if1"
    set _action "shutdown"
}
if {[regexp "^up" $_if1] && [regexp "^admin-down" $_if2]} {
    set _interface "interface $_config_if2"
    set _action "no shutdown"
}
if {[regexp "^up" $_if2] && [regexp "^down" $_if1]} {
    set _interface "interface $_config_if2"
    set _action "shutdown"
}
if {[regexp "^up" $_if2] && [regexp "^admin-down" $_if1]} {
    set _interface "interface $_config_if1"
    set _action "no shutdown"
}

if {[info exists _interface] && [info exists _action]} {
    puts "if_updown_IBM: if1:$_if1 if2:$_if2 ACTION: $_interface -> $_action"
    # new configuration to implement
    if [catch {cli_exec $cli1(fd) "config t"} result] {
        error $result $errorInfo
    }
    if [catch {cli_exec $cli1(fd) "$_interface"} result] {
        error $result $errorInfo
    }
    if [catch {cli_exec $cli1(fd) "$_action"} result] {
        error $result $errorInfo
    }
    #Commit the changes
    puts "To commit"
    if [catch {cli_exec $cli1(fd) "commit"} result] {
        error $result $errorInfo
    }
    #Exit config mode
    puts "commited, to end"
    if [catch {cli_exec $cli1(fd) "end"} result] {
        error $result $errorInfo
        puts "End"
    }

    if [catch {cli_close $cli1(fd) $cli1(tty_id)} result] {
        error $result $errorInfo
    }
} else {
    puts "if_updown_IBM: if1:$_if1 if2:$_if2 ACTION: No action"
}


Cisco Employee

aha that is good info thanks for that marcos. anatomically your script looks fine. it might be good for the debug to see with a puts the config strings used so we can replay that. I think that the commit may failand hang there indefinitely.

On the other hand, I see some EEM related fixes in 533 also that we may benefit from.

it may be possibly be easier to simplify this script:

when we see ten 5 or 6 go down, we just force a switchover manually if this is mclag?

or basically put both 5 and 6 in admin shut down, regardless of whom of the 2 went line down.

as I didnt see the puts commit to end I assume that that commit was failing/tripping somewhere maybe.

one other thing to note is that the puts has had its issues and at times not visible so I resorted to using action_syslog also for debug statements ...

cheers

xander

Community Member

Xander, I have attached the sh event manager trace all for the times described in the logs. I can see some ''Embedded Event Manager' detected the 'fatal' condition 'could not find key' (err:-1345057280)' and not sure what is it about..

the idea of the script is to detect when any of both interfaces goes down, then it admin shut down the other one, then when it detects the first interface come up again, do a admin no shut down on the second one to restore the full link.

I do think the commit fails due to user authentication issue, something like even it should be local, is looking for aaa authentication somewhere..

any command to bypass user authentication or something? at least to discard any user issue.

Cisco Employee

it is possible that the error is related to the user authentication.

one thing to try is to use the eem user and make the configuration change with that. the failed key is at times related to that, although from the trace you had posted I couldnot relate them. I'd like to see an eem/pd/error with note: 

aaa_get_user_taskmap() failed for policy/user

this is possibly in the show tech event manager if you can pull that also after a failed run.

or if you can, move all auth to local and put the eem user in cisco-support to eliminate all of that auth stuff and we can shave it back up from there?

cheers!

xander

Community Member

This post looks like my issue:

https://supportforums.cisco.com/discussion/10624381/eem-tcl-script#670005

Is there any way to check if the BUG also applies to IOS XR 5.3.2 ?

Cisco Employee

these are very eem engine specific to ios 12.2

there couldbe an issue running into the aaa stuff, but rather confrim via the show tech.

x

Beginner

Hi Xander,

Is it possible with EEM to track an IP instead of event logs ?

I need to track a particular server and then, in case of 3 failed pings in a raw, I would like to shut the L3 interface for this server on my router.

I use ASR9000 in 5.1 or 5.3 and 65xx or 76xx in IOS 12.2

Let me know if it's possible.

Many thanks in advance!

Samuel

Beginner

PS : I shuold probably use IP SLA sessions ?

Cisco Employee

hi samuel,

eem triggers on snmp or syslog. so you can tirgger on an ip, if you do object tracking that spits a syslog as a result that we can spawn the eem run for

xander

Beginner

Thank you Xander,

As usual, top service :-)

Beginner

Hi Xander,
We have an ASR9010, which have 4 logical interface and 1 physical interface. We want to config EEM that if more than 2 logical interface go down, it will shut down the physical one. We want to have a variable to easy to track, say it it "n", and n=0. If 1 logical interface go down, it will increase by 1 (decrease by 1 if 1 logical int go up). If n>=2, it will automatically shut the physical interface and no shut if n<2. Can we do that with EEM on ASR 9010. If yes please give me some help.Thank you very much

Hi

I don't have tacacs in my asr9k and I want to write tcl script that every command with defined users in asr9k insert in privilage or config mode, saved in file or email to me. What can I do this and with which tcl script ? when I use eem in My asr9k it's so different with 7600 series. I don't have "event manager applet" command and I don't have any way to import any  tcl script in eem.

 

Enthusiast
It is possible to verify the state of a route by specifying the next-hop, and if the next-hop changes take actions such as downloading an interfaces? would you have an example to show me regards Thank you
Beginner

I did this w boolean tracking objects on bfd watched static routes to a remote loopback coupled w multihop bgp+bfd.  It's sub-second to bring down the NNI for us and i have it on 7600s and asr9k.  

Beginner

Hi Xthuijs,

I am looking for solution EEM scripts rather than advertised-map in BGP.

my goal is to trigger EEM script when interface connected to ISP1 is down.

 

for example, if loopback0 is down, EEM configure bgp command(neighbor description)

my question :

1. if I use IPSLA, how can I apply it into EEM script

track ISP1-prefix
type route reachability
route ipv4 59.100.140.148/30

 

2. can I use syslog instead of IPSLA ? I tried it but it didn't work.

 

ASR9K>>

event manager environment _syslog_pattern
event manager directory user policy "bootflash:/"
event manager policy test-tcl.tcl type user

 

SEC-M2-IRT1001-C1#sh event manager policy registered
No. Class Type Event Type Trap Time Registered Name
1 script user syslog Off Fri Jan 18 12:55:34 2019 test-tcl.tcl
occurs 1 pattern {.*(Loopback)(0).*(changed state to down)}

 

test-tcl.tcl >>

::cisco::eem::event_register_syslog occurs 1 pattern $_syslog_pattern maxrun 90

#
# This EEM tcl policy was generated by the EEM applet conversion
# utility at http://www.marcuscom.com/convert_applet/
# using the following applet:
#
# event manager applet loopback0-down
#
# action 6.0 cli command "enable"
#
# action 6.1 cli command "config t"
#
# action 6.2 cli command "router bgp 45472"
#
# action 6.3 cli command "neighbor 49.255.228.69 description TEST-script"
#
# end
#

namespace import ::cisco::eem::*
namespace import ::cisco::lib::*

array set arr_einfo [event_reqinfo]


if [catch {cli_open} result] {
error $result $errorInfo
} else {
array set cli1 $result
}

if [catch {cli_exec $cli1(fd) "enable"} _cli_result] {
error $_cli_result $errorInfo
}

if [catch {cli_exec $cli1(fd) "config t"} _cli_result] {
error $_cli_result $errorInfo
}

if [catch {cli_exec $cli1(fd) "router bgp 45472"} _cli_result] {
error $_cli_result $errorInfo
}

if [catch {cli_exec $cli1(fd) "neighbor 49.255.228.69 description TEST-script"} _cli_result] {
error $_cli_result $errorInfo
}


# Close open cli before exit.
catch {cli_close $cli1(fd) $cli1(tty_id)} result

Content for Community-Ad