06-06-2018 11:50 AM - edited 03-05-2019 10:33 AM
Hi all,
Currently the majority of our larger sites are outfitted with two 2921 ISRs running HSRP, and a Layer 3 core switch that does a 10.0.0.0/8 route out to the VIP of the routers (and a default route to ASA). These routers each have a single MPLS connection. This was all setup by our MSP, and at the time we were still running legacy Avaya core switches in most places. Because of this, our MSP preferred not to do any dynamic routing on the switch side. Pretty soon I'll be deploying all new switches to one of these sites. The core is going to be a stack of 3850-24XS switches with IP services. My goal, and what I'm currently attempting to set up in lab, is to eliminate the use of HSRP for failover between the MPLS links. I've heard and read elsewhere that failover can be done much more gracefully by simply using routing protocols. As it exists currently, each MPLS router is learning routes via BGP. In lab, I configured two 3925's that we had on hand with each a DMVPN tunnel (closest thing I could get to MPLS in a lab setup), and redistributed the routes learned from BGP into an EIGRP process. This EIGRP process is also running on the core 3850 switch. All routes are being learned properly and I can ping across the WAN from the core as would be expected. The issue I'm running into is when trying failover. In order to simulate a loss of service, I'm removing the access vlan on the switchport that labrouter1 is connected to upstream (a guest internet port). After literal minutes, the BGP session finally times out, and traffic moves to Router2/DMVPNtunnel2. Everything appears to work okay when actually shutting the port, but it's of course very rare that a sudden outage would constant a link-down between the PE MPLS router and our MPLS router. From what I can tell I probably need to have our MSP lower the BGP hold time so that when the MPLS circuit is lost the table will clear faster and force traffic over the secondary MPLS, but I'm not entirely sure.
This was all spawned when one of my older, better-seasoned coworkers suggested that it would be a much cleaner set up to run it this way rather than relying on HSRP and SLAs. Me, I'm just a novice with a few years of experience with this so I may be completely out in left field, but if anyone would happen to have any advice for me with this build, please let me know. At the very least, my more attainable goal is to implement EIGRP so that I don't have to put static routes everywhere everytime I add a network to a site. Even if I have to maintain HSRP for the failover, I at least want to clean this up a bit from its current state. Let me know if I should post sanitized configs for a better visual.
Sorry if this seems like an elementary problem lol
Thanks!
06-06-2018 01:04 PM
Hello,
I am not sure I understand what you are asking, but the BGP fast external failover feature terminates the external BGP session immediately when the peer link fails, without waiting for the hold-down timer to expire. You might want to have a look at that feature...
R1(config-if)#router bgp 65001
R1(config-router)#no bgp fast-external-fallover
06-06-2018 01:12 PM
06-06-2018 01:19 PM
Hello,
can you post the configuration of the device in question, the one with the primary and failover links ? Usually an IP SLA in conjunction with an EEM script works best.
06-06-2018 01:40 PM
I've attached the config of each test router. Keep in mind that because I'm testing with DMVPN, the MPLS interface is technically Gi0/0. If I shut this interface, the tunnel drops and BGP clears after a short period as expected. But if the link stays up and something happens downstream, BGP does not automatically roll to router 2. Also, just for clarification, the neighbor in EIGRP 130 (10.13.0.5), is my layer 3 core.
06-06-2018 02:30 PM
Hello,
here is what I am thinking: from both your routers, you need to ping an IP address further downstream that is not reachable when the downstream failure occurs. This is the IP address being tracked. If the IP address is not reachable, the EEM script will automatically shut the interface (GigabitEthernet0/0 in your case). I hope that makes sense. Below are the scripts for each router (IP address 1.1.1.1 is obviously not the one you need to configure, you need one that goes down when there is the downstream failure):
TEST3925-RTR2
ip sla 1
icmp-echo 1.1.1.1 source-interface Port-channel20.120
frequency 1
timeout 1000
!
ip sla schedule 1 life forever start-time now
event manager applet SHUT_INT
event track 1 state down
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command "int GigabitEthernet0/0"
action 4.0 cli command "shut"
action 5.0 cli command "end"
action 6.0 cli command "clear ip nat translation *"
!
event manager applet SHUT_INT
event track 1 state up
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command "int GigabitEthernet0/0"
action 4.0 cli command "no shut"
action 5.0 cli command "end"
action 6.0 cli command "clear ip nat translation *"
TEST3925-RTR1
ip sla 1
icmp-echo 1.1.1.1 source-interface Port-channel10.120
frequency 1
timeout 1000
!
ip sla schedule 1 life forever start-time now
event manager applet SHUT_INT
event track 1 state down
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command "int GigabitEthernet0/0"
action 4.0 cli command "shut"
action 5.0 cli command "end"
action 6.0 cli command "clear ip nat translation *"
!
event manager applet SHUT_INT
event track 1 state up
action 1.0 cli command "enable"
action 2.0 cli command "conf t"
action 3.0 cli command "int GigabitEthernet0/0"
action 4.0 cli command "no shut"
action 5.0 cli command "end"
action 6.0 cli command "clear ip nat translation *"
06-06-2018 02:38 PM
Hello
II am understanding this correctly, you are querying the resiliency of a spoke mpls site and how to failover between two ISR site routers due to an indirect outage within the "mpls cloud"?
res
Paul
06-07-2018 06:04 AM
06-07-2018 06:47 AM
Hello,
in order to provide more specific help with your configs, post a schematic drawing of your setup including IP addresses...
06-07-2018 08:53 AM - edited 06-07-2018 08:56 AM
Hello
Okay - so let’s say you are receiving default routes from each isp ebgp peer with preference given to your primary rtr path.
Based on these ebgp defaults you can redistribute a igp default (say ospf) into your L3 core with varying metric types ( type 1 primary - type 2 secondary)
As such if a failure from within the primary mpls occurs this would result is the primary default disappearing and the secondary ebgp default will become active and a such it’s related ospf type 2 default route will be installed into you L3 so you would have resiliency even if your primary ebgp peer is still up
res
paul
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide