WAN QoS policy not working as expected

wilson_1234_2 · ‎06-28-2013

We have the policy applied shown below. What I am seeing is that the policy does not seem to be limiting bandwidth to the different classes when there is congestion.
My question is in regard to how the policy is structured and what would be the result of the "bandwidth remaining".
My understanding of the "bandwidth" (without "remaining") command, during congestion, the bandwidth would be limited to a percentage or bandwidth amount entered in the class map.
This does not seem to be happening with "bandwidth remaining" as the "data High" class is consuming the entire link during times of congestion.
It looks to me, in the policy below, like the "ef" class has 50% of the total bandwidth, then, the other classes would have the remaining bandwith, depending on remaining percentage. Is this correct?
Also, the bandwidth totals from "ef" and all other classes totals more than 100% of the bandwidth avaialble.

Am I not understanding this policy correctly?

I would like to limit the bandwidth to the values specified rather than letting one class consume the entire link.

class-map match-all BestEffort
match any
class-map match-all DataHigh-AF21
match access-group 15
class-map match-all Voice
match access-group 14
class-map match-all CriticalHigh-AF31
match access-group 13
class-map match-all DataLow-AF11
match access-group 16
class-map match-all Video-AF41
match access-group 12
!
!
policy-map Policy
class Voice
set ip dscp ef
priority percent 50
class Video-AF41
set ip dscp af41
bandwidth remaining percent 15
class CriticalHigh-AF31
set ip dscp af31
bandwidth remaining percent 60
class DataHigh-AF21
set ip dscp af21
bandwidth remaining percent 20
class DataLow-AF11
set ip dscp af11
bandwidth remaining percent 3
class BestEffort
set ip dscp default
bandwidth remaining percent 1
!
!

Joseph W. Doherty · ‎07-01-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Am I not understanding this policy correctly?

Yes, you aren't understanding this policy correctly.

It looks to me, in the policy below, like the "ef" class has 50% of the total bandwidth, then, the other classes would have the remaining bandwith, depending on remaining percentage. Is this correct?
Also, the bandwidth totals from "ef" and all other classes totals more than 100% of the bandwidth avaialble.

Non-LLQ bandwidth statements do not limit bandwidth. Any such class can use 100%.

LLQ bandwidth statements do have an implicit policer, but it only engages when it queues. I.e. Your EF traffic could also use 100% of the bandwidth, but once there's congestion, that LLQ class should be limited to 50%.

No, your EF and other classes do not total more than 100%. Bandwidth remaining percentages apply to bandwidth not used by non-bandwidth remaining classes. I.e. the 99% allocated for your bandwidth remaining classes is how the remaining 50% bandwidth (i.e. bandwidth left from the LLQ 50%) is proportioned.

I would like to limit the bandwidth to the values specified rather than letting one class consume the entire link.

That might be accomplished by using a policer or shaper, but assuming that bandwidth is otherwise unused, and you don't pay extra for using it, why not use it?

PS:

BTW, Cisco recommends not to allocate more than 1/3 your bandwidth to LLQ. (For myself, if needed, I believe going as high as 50% for LLQ is acceptable.)

Normally, default might be provided better service guarantee than data-low.

wilson_1234_2 · ‎07-05-2013

Thanks for the reply Joseph.

Ok, suppose there is no voice traffic at the moment and we have heavy traffic from only two of the queues:

CriticalHigh

DataHigh

During times of congestion we are saying that CriticalHigh will use a minimum of 60% and DataHigh will use a minimum of 20% of the link, correct?

Which one will use the remaining 20% if they are both trying to use it? Since they are marked AF31 and AF21 respectively would this be the deciding factor?

What I am seeing at some of the sites that are complaining about slow response with their Business Critical applications is that there is traffic matchin acl 160 that is consuming the link.

That traffic is considered lower priority than the Business Critical. When doing "sh ip flow top-talkers" I see numerous workstation source from the remote end and destined for acl 160. IT seems that traffic is consuming the entire link and the other traffic is not getting their allotted BW percentage.

"sh policy-map int" shows the policy is applied properly.

Joseph W. Doherty · ‎07-05-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Ok, suppose there is no voice traffic at the moment and we have heavy traffic from only two of the queues:

CriticalHigh

DataHigh

During times of congestion we are saying that CriticalHigh will use a minimum of 60% and DataHigh will use a minimum of 20% of the link, correct?

No, if there was only traffic in these two queues, and if they wanted as much as possible, they would share it 60:20 or (3:1 or 75%:25%).

Which one will use the remaining 20% if they are both trying to use it? Since they are marked AF31 and AF21 respectively would this be the deciding factor?

They would share that 20%, as described above.

What I am seeing at some of the sites that are complaining about slow response with their Business Critical applications is that there is traffic matchin acl 160 that is consuming the link.

That traffic is considered lower priority than the Business Critical. When doing "sh ip flow top-talkers" I see numerous workstation source from the remote end and destined for acl 160. IT seems that traffic is consuming the entire link and the other traffic is not getting their allotted BW percentage.

Sites, plural eh? Describe your WAN topology.

wilson_1234_2 · ‎07-09-2013

The sites pretty much just have a primary connection Verizon MPLS. Some have a single T1, others have multilink bundle.

Most sites are not experiencing any issues because their links are no where near being fully utilized.

The issue is with the sites that have 10-20 users and multilink connection to MPLS.

I can see the link is fully utilized a good part of the time, the utilization on the interface is always 60 - 70% utilized with bursts of 100% that is sustained for minutes at a time.

The users complain of e-mail hanging, AD connectivity problems and other application issues. I can see sometimes during these high utilization periods that numerous machines are making a connection to what would be considered best effort traffic as the top talkers on the link. At the same time, I see users in the Critical High class.

The Critical High should be getting a minmum of 60% during congestion, but should be getting (with no voice traffic ) as much as 100% (theoretical) if it needs it, with everything else, (especially BE) getting less to none of the link.

The same goes for any other class, if the class with higher priority has no traffic:

The X class should be getting a minimum of X% during congestion, but should be getting (with no voice traffic) as much as 100% (theoretical) if it needs it.

Correct?

I see no drops in any of the queues except for BE.

Joseph W. Doherty · ‎07-09-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Verizon MPLS, eh? Can multiple sites send traffic to another site? If so, if you're not already using it, you'll need to consider using Verison's MPLS QoS. (Haven't use their MPLS in a couple of years, but they did offer up to a six class model with various ratio choices for their classes.)

If sites have different bandwidth connections to the Verizon MPLS cloud, and if only one site would normally transmit to another site (e.g. logical hub and spoke), then you'll want the faster side to shape for the slower side's bandwidth. (Also if the aggregate of spokes can exceed the hub's bandwidth, you might want to shape spokes to preclude this.)

If you devices support HQF, suggest using FQ in the non-LLQ classes. (FQ usually works so well, often you can reduce the need for multiple classes.)

The assigned bandwidths, as minimums, only will be seen in actual usage if all class bandwidth allocations account for 100% and all classes want their minimum or more bandwidth.

If a class is getting less than its minimum, that may happen if that's all it currently "wants".

PS:

Remember, QoS can favor some traffic over another. If there's too much "important" traffic, for the capacity of the link, it will congest against itself. Then you need more bandwidth (which might be additional raw bandwidth or via reducing bandwidth need [e.g. WAAS]).

PPS:

Part of my "secret" approach, for dealing with many WAN congestion issues, is to not only recognize there's some traffic you might want to prioritize, but there's often some traffic which may be de-prioritized.

I've found a policy like:

high - bandwidth 81%

normal - bandwidth 9%

low - bandwidth 1%

very effective as long the actually offered bandwidths are somewhat the inverse (i.e. high should have the least volume, and low may have the highest volume). The foregoing should have FQ for at least "normal", but ideally all 3 classes might use FQ (usually FQ is not too much needed for "high").

Another part of my "secret" approach, is to not only use "classical" traffic kind for service classification, but to also use actual bandwidth demand/usage for classification. (The latter is wonderful for dealing with stuff like Microsoft's traffic, which all looks much "alike".)