Solved: QoS in the WAN Mesh

mattp0002 · ‎07-31-2015

You guys have been very helpful and it is greatly appreciated. I learn a lot of things every day reading through the posts so let me thank you in advance.

I have another situation here I'm trying to get my head around. QoS in the "cloud" or rather in the full mesh WAN. Here's a simplified example scenario which represents what I'm dealing with in reality:

Let's say I have 3 sites: Site A, Site B, and Site C.

Each site has its own ISR Router connecting the local site LAN to a Telco-operated MPLS network using one circuit.

Each CE endpoint on that Telco MPLS WAN is routable to each other CE endpoint.

Each MPLS circuit has a bandwidth of 10 mbit/sec and is delivered as Ethernet.

Over-layed on top of this MPLS WAN, each of my ISR routers has 2 VTI ipsec interfaces configured - and each of these tunnels transverses the MPLS WAN and terminates at the other sites' router.

Therefore: Site A has IPsec tunnels to Sites B and C, Site B has tunnels to A and C, and C has tunnels to Site A and B.

Inside each of these WAN tunnels I'm running ospf for routing and failover - the WAN will re-converge upon tunnel failure using any remaining paths. (Yes in this simplified scenario both tunnels are built along the same physical circuit but just go along with me for the purposes of this example)

So, here's the problem: Site A has a 10 mbit Ethernet pipe to it. Sites B and C both also have 10 mbit pipes. If someone at Site B starts jamming 10mbit/sec of data down their tunnel towards Site A, and concurrently someone at Site C starts doing the same down their tunnel, well suddenly there's 20 mbit/sec of data heading across the MPLS WAN (wrapped in ESP tunnels) towards Site A which only has a 10mbit circuit into it.

Therefore, the Telco provider starts dropping packets, because they are policing that interface heading towards Site A.

Simply purchasing more bandwidth at each site from the Telco solves nothing, because the problem will still occur (albeit at higher datarates).

Obviously I could put an egress policy outbound from Site B and Site C rate-limiting traffic to 5 mbit/sec each, so A is not overloaded - but these rate-limits would always be in place, and at times when Site C is not transmitting at all, Site B will only have 5 mbit/sec useable instead of 10 mbit/sec.

I really am at a loss here for how to implement QoS to prevent the circuit from being overloaded - because my Routers A, B, and C are not aware of the instant loads on the others' circuits. They would need this awareness in order to make rational decisions about how much traffic to pump at any given moment.

Is there some sort of dynamic QoS protocol that can be implemented router-to-router in order to make this happen?

Forgive my ignorance concerning QoS. Just point me in the right direction and I'll figure the rest out. THANKS!!!

Joseph W. Doherty · ‎07-31-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

There are several possible approaches.

First, check whether your WAN provider supports QoS. If they do, a WAN egress QoS policy might be all you need.

If not, you next easiest alternative is to control your aggregate bandwidth to other sites. The major problem, is as you note, using unused bandwidth from other sites. Do keep in mind, you may benefit even if you don't restrict aggregate bandwidth to what's actually available. For example, your other two sites might restrict themselves to a 15 Mbps aggregate even though the maximum is only 10 Mbps. Basically, you allow some oversubscription, whatever seems to work well for you. (BTW, also, you don't have to split the aggregate evenly across sending sites.)

Another alternative would be to use additional links at each site, so you have a physical mesh. The additional expense would seem to make this a problem, but if you need to guarantee service levels, it's what might need to be done. Also keep in mind, you might vary the bandwidth of these links, for example, instead of having a single 10 Mbps, you have have 5 and 5 or a 7 and 3 or ... This also may provide additional redundancy. (Also, if using Cisco routers, something like Cisco's PfR could leverage the additional links, including hopping via another site.) Also when looking at WAN links, Internet connections can be less expensive than a private cloud. I've found if you use Internet connections only for p2p, performance is often like a dedicated leased p2p.

Regarding "dynamic" QoS, you could probably make your own, using IPSLA and embedded scripting (not for the faint of heart). However, recently I've read Cisco's later DMVPN now has dynamic shaping as a feature, Adaptive QoS. As you're already using VTI tunnels, this might be a possible alternative.

View solution in original post

Joseph W. Doherty · ‎07-31-2015

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

There are several possible approaches.

First, check whether your WAN provider supports QoS. If they do, a WAN egress QoS policy might be all you need.

If not, you next easiest alternative is to control your aggregate bandwidth to other sites. The major problem, is as you note, using unused bandwidth from other sites. Do keep in mind, you may benefit even if you don't restrict aggregate bandwidth to what's actually available. For example, your other two sites might restrict themselves to a 15 Mbps aggregate even though the maximum is only 10 Mbps. Basically, you allow some oversubscription, whatever seems to work well for you. (BTW, also, you don't have to split the aggregate evenly across sending sites.)

Another alternative would be to use additional links at each site, so you have a physical mesh. The additional expense would seem to make this a problem, but if you need to guarantee service levels, it's what might need to be done. Also keep in mind, you might vary the bandwidth of these links, for example, instead of having a single 10 Mbps, you have have 5 and 5 or a 7 and 3 or ... This also may provide additional redundancy. (Also, if using Cisco routers, something like Cisco's PfR could leverage the additional links, including hopping via another site.) Also when looking at WAN links, Internet connections can be less expensive than a private cloud. I've found if you use Internet connections only for p2p, performance is often like a dedicated leased p2p.

Regarding "dynamic" QoS, you could probably make your own, using IPSLA and embedded scripting (not for the faint of heart). However, recently I've read Cisco's later DMVPN now has dynamic shaping as a feature, Adaptive QoS. As you're already using VTI tunnels, this might be a possible alternative.