SD-WAN Branch w/ single TLOC - unable to failover during packet loss

jonhurley2010 · ‎08-23-2021

We have a small branch vEdge100b deployed with a cable internet provider. The vEdge devices have a 'biz-internet' ipsec TLOC connection to 3 datacenters. I've attached a diagram.

Recently a real-world regional ISP issue caused packet loss between one of our datacenter, datacenter 1, and one of our remote branch 1. The Datacenter routes remained in the table during this impairment - while App Route SLAs reported loss violations. SD-WAN BFD sessions to all datacenters remained up.

From a client perspective - the Datacenter 1(10.1.x.x/16) resources were unavailable. While connectivity to datacenter 2(10.2.x.x/16) and 3 (10.3.x.x/16) resources were not impacted at all.

In our traditional ISR environment we would have modified routing or just shut down an overlay tunnel to force traffic through another datacenter temporarily. However, it seems we're stuck Cisco SD-WAN.

What options are there to reroute traffic dynamically through another biz-internet datacenter during this packet loss impairment?
Is there any new or upcoming feature enhancements for Cisco SD-WAN to map bfd/SLAs loss/latency/jitter trouble to OMP route suppression?

We're on 20.3.2/17.5.01A.

Thank you.

--

ALSO:

I did want to mention for sites that have two transports (ie biz-internet and mpls for example), SLA defined traffic fails over as expected during similar packet loss conditions.

Kanan Huseynli · ‎08-23-2021

Hi,

it is expected that you have route in RIB/FIB while path does not SLA compliant. This is because DC1 is reachable (even with bad parameters) and announce route to vsmart. On the other hand, vsmart re-advertises route to branch1.

Just question, is other DCs advertise DC1 subnets? If not , then router in any case tries to send to DC1 (because it is only device that announce route and reachable even with bad SLA). Even default or summary route from other DCs will not help, because DC1 advertises more specific route.

Application Aware Routing does not change routing in reality. It just determines whether path can be used or not if routing exists. Basically, it can't do anything if you don't have best route over interface/tunnel. So, normally it is used when multiple with the same path routing information available and you want to send traffic more intelligently rather than doing simple ECMP without looking underlay parameters (SLA).

Taking you case, let me explain with routing info.

Suppose you have 2 DC - DC1 and DC2 and one branch. DC1 advertises subnet 10.0.0.0/24, DC2 does not advertise this subnet (may be it advertises summary route like 10.0.0.0/8 or even default - does not matter). You have one underlay - mpls (for simplicity).

In normal case, branch has UP BFDs to both DC (one for DC1, another for DC2). You do AAR and apply some SLA.

When both SLA compliant, traffic is sent only and only DC1, because more specific route received from it.

Now, path to DC1 does not match SLA, but DC2 does. Router evaluates AAR

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.

jonhurley2010 · ‎08-23-2021

Hi Kanan, it looks like your reply was cut off on the routing info explanation - hoping to see the rest!

To answer your we use 10.0.0.0/8 summaries as backup routes in addition to each datacenter advertising its more specific route summaries. Not sure this would help anyway in Cisco SD-WAN/Viptela with only 1 TLOC from your explanation. Thanks!

Kanan Huseynli · ‎08-24-2021

Hi,

ok, expected answer. But this will not help as I explained, because best route shows path to DC1 even with bad SLA.

You should do filtering on vsmart using centralized control policy in this case, manually when failover needed.

I don't know dynamic way to solve you case. At most, you can create script that gets loss/jitter/latency and in case of no-match push policy via API.

HTH,

HTH,
Please rate and mark as an accepted solution if you have found any of the information provided useful.