ā10-08-2015 08:31 AM - edited ā03-05-2019 02:29 AM
Hi,
I'm need your help to discuss about how to improve the routing convergence when a headquarter's wan link fail and which is the best (minimum) time of convergence that could be achieved.
Then I attach an image that depict the actual network infrastructure. By the way we accept any recommendation if you consider a better infrastructure to reduce the convergence time, because we are allowed to change the infrastructure if is needed.
The main requirement is the HQLAN and the BRLAN has to be connected in a secure manner and need high availability (SLA 99.95). And therefore there is some application on HQLAN that only allow a very small downtime (about 15 seconds) after close the connections. So the network must converge very fast to reduce the downtime in connectivity between HQLAN and BRLAN.
In the attached image there is a Headquarter connected with 2 Layer 3 BGP VPN of different ISPs using GigabitEthernet interfaces. The ISP's CPE are connected to the same subnet and has eBGP sessions with our routers (view the attached image). The HQ routers are using EIGRP with the HQ Firewall (Cisco ASA with failover).
We have 2 different branch office connectivity models, one with 2 routers and other with only one router connected to both ISPs.
The traffic between HQ and BRANCH is over IPSec Tunnel, the peering is between ASA Interface and Router's Wan interfaces. Then in the ASA the crypto has multiple peers entries.
All protocol's timers are setting by default and all interfaces are Ethernet (Fast or Giga)
Thanks
ā10-08-2015 09:36 AM
Hello.
Unfortunately I didn't get how do you route over the tunnel (statically or dynamically somehow).
As I understood, you leverage on SP's BGP convergence time and it may surprise you (with poor results), unless you have some SLA for this.
If you are looking for really fast convergence, I would say it's better to have IGP running over tunnels, that stuck to transport (provider).
I mean - CE1 should be connected to SP1 only and run tunnel over SP1, CE2 - connected to SP2 and run tunnel over SP2. Having IGP tuned - you may reach 3 seconds convergence.
Another option is to run PFRv3 - in this case you would be able to track path quality and reroute traffic for sensitive applications in case of brownouts (2-3 seconds).
ā10-08-2015 09:59 AM
Hi Vasilii,
First off all thank for your reply and I'll investigating the both options.
So over the tunnel there aren't any routing protocols.
At the hq-routers advertise HQLAN and their connected interfaces to both ISPs by eBGP. Therefore we receive from ISPs the routes that was advertised by the branche's routers and the WAN networks advertised by the ISP. The hq-routers redistributes received BGP routes into EIGRP.
At the branch router advertise the connected interfaces and receive the WAN networks, the HQLAN and the Cisco ASA networks from eBGP.
Then the IPSec interesting traffic is HQLAN <--> BRLAN.
BR
ā10-08-2015 10:09 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
I'm thinking the EGP (BGP) is going to be an impediment to really fast convergence.
As you already are using an IPSec Tunnel, one method might be to set things up so you have two such tunnels, each that will only come up across one ISP and some kind of tunnel end-point to end-point reachability test, with quick loss detection and fast convergence. (For example, running an IGP across these tunnels with subsecond keepalives/hellos.)
Another approach might be to have a tunnel across each ISP cloud, and there too do as above. (You would route your single IPSec tunnel across the two provider cloud tunnels.)
[edit]
PS:
I didn't see Vasilii's post until I posted mine. I think we're saying much the same. I'm unfamiliar with PfRv3, but if its SLA tests can re-route in 2 to 3 seconds, that certainly might be a good enough option.
Advanced IGPs can often be tuned to provide subsecond convergence, the problem is getting quick notification of path loss when you cannot depend on physical link down. When working with SLA tests or very fast keepalives/hellos, you need to also consider their impact to your network and often need to provision QoS to insure their packets are not lost or delayed by routine traffic.
Of the two approaches I described, the first where the higher level sees two equal cost paths, often can provide even faster convergence to some IGPs if one path is lost.
ā10-13-2015 12:49 PM
Hello,
What do you think about this: http://www.networking-forum.com/blog/?p=2401 ? Do you think that similar solution could work?
BR
ā10-15-2015 02:20 AM
Disclaimer
The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.
Liability Disclaimer
In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.
Posting
Yes, a similar solution should work, this is a variant of what both I and Vasilii are suggesting.
One thing that reference mentions is "We could just avoid all this headache and reduce the BGP hold timers in the first place, but that would be no fun ", which is true and not true depending how tight you want your BGP timers. I recall (?) Cisco BGP timers wouldn't support your 15 second outage flip over goal.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide