Solved: How to failover based on line errors

cofee · ‎04-20-2017

We have 2 WAN circuits that we configured as active and standby. At the edge we are using HSRP and it is tracking the active primary circuit so if the active circuit fails, HSRP decrements the priority and therefore secondary circuit takes over. But the issue is whenever we have the issue with the primary circuit, it's related to line and CRC errors and this doesn't affect the line protocol and because of this even though primary circuit seems to be up physically, but it doesn't pass any traffic as far as VPN traffic is concerned (BGP peering is not affected). All our users are remote and use VPN to connect to the network, if there are too many line/crc errors, users can't connect to the VPN using the affected circuit. I am looking for any suggestions to do a failover based on line/crc errors, because as of now we have to manually shut the circuit down so the secondary circuit can take over.

Any help would be appreciated.

Julio E. Moisa · ‎04-20-2017

Hi

If you have the error messages you could use EEM to create a script to execute a configuration automatically, it could disable the primary interface to avoid flappings.

This is an example:

event manager applet SCRIPT
event tag cua syslog pattern "is down"
event tag pri syslog pattern "100"
event tag qui syslog pattern "holding time expired"
event tag sec syslog pattern "Neighbor 1.1.1.1"
event tag ter syslog pattern " GigabitEthernet0/0"
trigger occurs 1 period 10
correlate event pri and event sec and event ter and event cua and event qui
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command " GigabitEthernet0/0"
action 4.0 cli command "shutdown"
action 5.0 cli command "exit"
action 6.0 cli command "end"

This example could be useful for BGP

event manager applet SCRIPT-02
event syslog pattern "%BGP_SESSION-5-ADJCHANGE: neighbor 1.1.1.1 IPv4 Unicast topology"
action 1 cli command "enable"
action 10 cli command "conf t"
action 11 cli command "router bgp 100"
action 12 cli command "neigh 1.1.1.1 shutdown"
action 13 cli command "end"
action 14 cli command "wr memory"

Basically you need to include the errors messages you have detected on the devices and add the configuration to execute once the script is triggered by the error message.

* The commands could be different by device model.

Also you could use IP SLA + EEM, to verify the reachability to a specific IP and if it fails a script will be executed.

track 10 ip sla 10 reachability
delay down 10

ip sla 10
icmp-echo 8.8.8.8 source-ip 1.1.1.2
frequency 5

ip sla schedule 10 life forever start-time now
ip sla enable reaction-alerts

event manager applet FAILOVER-INTERNET

event tag prim syslog occurs 1 pattern "%TRACKING-5-STATE: 10 rtr 10 state Up->Down"

event tag sec syslog occurs 1 pattern "%TRACKING-5-STATE: 10 rtr 10 reachability Up->Down"

trigger

correlate event prim or event sec

action 1.0 cli command "enable"

action 2.0 cli command "no ip route 0.0.0.0 0.0.0.0 1.1.1.1 track 10" <--remove the current default route.
action 3.0 cli command "ip route 0.0.0.0 0.0.0.0 2.2.2.2 " <--create a new default route pointing to other next hop

action 4.0 cli command "exit"

action 5.0 cli command "write memory"

Hope it is useful

:-)

>> Marcar como útil o contestado, si la respuesta resolvió la duda, esto ayuda a futuras consultas de otros miembros de la comunidad. <<

View solution in original post

Julio E. Moisa · ‎04-20-2017

Hi

If you have the error messages you could use EEM to create a script to execute a configuration automatically, it could disable the primary interface to avoid flappings.

This is an example:

event manager applet SCRIPT
event tag cua syslog pattern "is down"
event tag pri syslog pattern "100"
event tag qui syslog pattern "holding time expired"
event tag sec syslog pattern "Neighbor 1.1.1.1"
event tag ter syslog pattern " GigabitEthernet0/0"
trigger occurs 1 period 10
correlate event pri and event sec and event ter and event cua and event qui
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command " GigabitEthernet0/0"
action 4.0 cli command "shutdown"
action 5.0 cli command "exit"
action 6.0 cli command "end"

This example could be useful for BGP

event manager applet SCRIPT-02
event syslog pattern "%BGP_SESSION-5-ADJCHANGE: neighbor 1.1.1.1 IPv4 Unicast topology"
action 1 cli command "enable"
action 10 cli command "conf t"
action 11 cli command "router bgp 100"
action 12 cli command "neigh 1.1.1.1 shutdown"
action 13 cli command "end"
action 14 cli command "wr memory"

Basically you need to include the errors messages you have detected on the devices and add the configuration to execute once the script is triggered by the error message.

* The commands could be different by device model.

Also you could use IP SLA + EEM, to verify the reachability to a specific IP and if it fails a script will be executed.

track 10 ip sla 10 reachability
delay down 10

ip sla 10
icmp-echo 8.8.8.8 source-ip 1.1.1.2
frequency 5

ip sla schedule 10 life forever start-time now
ip sla enable reaction-alerts

event manager applet FAILOVER-INTERNET

event tag prim syslog occurs 1 pattern "%TRACKING-5-STATE: 10 rtr 10 state Up->Down"

event tag sec syslog occurs 1 pattern "%TRACKING-5-STATE: 10 rtr 10 reachability Up->Down"

trigger

correlate event prim or event sec

action 1.0 cli command "enable"

action 2.0 cli command "no ip route 0.0.0.0 0.0.0.0 1.1.1.1 track 10" <--remove the current default route.
action 3.0 cli command "ip route 0.0.0.0 0.0.0.0 2.2.2.2 " <--create a new default route pointing to other next hop

action 4.0 cli command "exit"

action 5.0 cli command "write memory"

Hope it is useful

:-)

>> Marcar como útil o contestado, si la respuesta resolvió la duda, esto ayuda a futuras consultas de otros miembros de la comunidad. <<

cofee · ‎04-20-2017

Thanks for your reply. Can I associate EEM with hsrp?

Julio E. Moisa · ‎04-20-2017

Yes, basically you need to create a SLA pinging to a specific IP from the primary IP related to the HSRP. Because if the interface looks up the HSRP will never change so you need something to verify the communication through that link and it could be an IP SLA.

Your script could be:

event manager applet FAILOVER-HSRP

event tag prim syslog occurs 1 pattern "%TRACKING-5-STATE: 10 rtr 10 state Up->Down"

event tag sec syslog occurs 1 pattern "%TRACKING-5-STATE: 10 rtr 10 reachability Up->Down"

trigger

correlate event prim or event sec

action 1.0 cli command "enable"

action 2.0 cli command "int vlan X"
action 3.0 cli command "shutdown "

action 4.0 cli command "exit"

action 5.0 cli command "write memory"

Try to lab it

:-)

>> Marcar como útil o contestado, si la respuesta resolvió la duda, esto ayuda a futuras consultas de otros miembros de la comunidad. <<

cofee · ‎04-20-2017

Correct me if I am wrong, based on my understanding with the EEM applet I can configure to monitor router interface for crc or input errors and if it reaches a certain threshold, based on the value configured it can take certain action like shutting down the interface and once that happens HSRP will kick in and failover to secondary circuit. But I guess I will have to go back and manually monitor the affected circuit to make sure that it's not receiving any more crc errors before we bring it up, because I believe creating an EEM applet to bring that circuit up when things are back to normal will complicate the over all configuration.

Please let me know your thoughts.