Solved: QOS Service Policy Bandwidth percent values not as expected.

Richard Bradfield · ‎01-29-2014

We have a number of sites using telstra IPWAN MPLS network. Most traffic comes out of our head office feeding into a 40Mbps Ethernet service. The service policy is attached to the Ethernet interface

Policy-map QOS_OUT_NEW

class ROUTING

bandwidth 1200

class SITE_DR

shape average 15000000 150000 0

service-policy QOS_OUT_9MB

class SITE_1

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

class SITE_2

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

class SITE_2

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

----etc--etc---

for a total of 24 sites of these 12 are 4Mbps and above, 5 are 2Mbps,1 at 1Mbps,2 at 512kbps,2 at 384kbps and 2 at 192kbps

the QOS_OUT_4MB below the others for different speedshave the same proportions

policy-map QOS_OUT_4MB

class RT-VOICE

priority percent 20

class RT-VIDEO

bandwidth percent 25

class CONTROL

bandwidth percent 3

class CRITICAL

bandwidth percent 18

random-detect dscp-based

class IMPORTANT

bandwidth percent 16

random-detect dscp-based

class TRANSACTIONAL

bandwidth percent 14

random-detect dscp-based

class class-default

shape average 4000000 40000 0

queue-limit 256 packets

random-detect dscp-based

random-detect dscp 0 64 256

now when I do a "sh policy-map interface"

I see for all the 4MB and 2MB policies below

for Voice

Class-map: RT-VOICE (match-any)

314515 packets, 21680630 bytes

5 minute offered rate 0000 bps, drop rate 0000 bps

Match: ip dscp ef (46)

314515 packets, 21680630 bytes

5 minute rate 0 bps

Priority: 20% (298 kbps), burst bytes 7450, b/w exceed drops: 0

298kbps is not 20% of 4 Mbps..I would expect 800kbps !!

also of interest the 1Mbps,512,384,and 192kbps policies showed the correct percentage!!

Can anyone help with this?

Vishesh Verma · ‎02-04-2014

Hi,

Right now QoS is calculating B/w percentages based on the 1.49 Mbps.

Can you try following and see if that changes it, it is probably how HQF chooses refrence bandwidth.

HWPolicy-map QOS_OUT_NEW

class ROUTING

bandwidth 1200

class SITE_DR

bandwidth 15000

shape average 15000000 150000 0

service-policy QOS_OUT_9MB

class SITE_1

bandwidth 4000

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

class SITE_2

bandwidth 4000

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

class SITE_2

bandwidth 4000

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

HQF divides the link's bandwidth equally to all classes of parent policy, If no bandwidth command is configured.

For Example -

If Interface is 8 mpbs, and we have applied a qos policy with 4 classes, it is going to chose lowest of shaping rate or B/W allocated 2 mbps each if no b/w staement is defined in the class.

Class-1 B/W - 2 Shaping rate - 1 Reference B/W for calc - 1

Class-2 B/W - 2 Shaping rate - 1.5 Reference B/W for calc - 1.5

Class-3 B/W - 2 Shaping rate - 3 Refernce B/W for calc - 2

Class-4 B/W - 2 Shaping rate - 2.5 Refernce B/W for calc - 2

However if you you configure B/W statement in the class it is going to choose that as the reference B/W. Best practice is to put B/W equal to shaping rate in the class.

Hope it helps....If it still doesn't solve your problem, I would need following outputs to check further.

Complete qos config
show run interface on which qos is applied
show version
show interface on which qos is applied

-Vishesh

View solution in original post

ambikamani · ‎01-29-2014

Hi

I would like to ask you as how is the bandwidth % reflecting for other classes on this 4mb circuit.

Richard Bradfield · ‎01-29-2014

Ambika,

They are all low by the same ratio, see below

Service-policy : QOS_OUT_4MB

        queue stats for all priority classes:
          Queueing
          queue limit 64 packets
          (queue depth/total drops/no-buffer drops) 0/0/0
          (pkts output/bytes output) 10042611/742597306

Class-map: RT-VOICE (match-any)
Priority: 20% (298 kbps), burst bytes 7450, b/w exceed drops: 0

Class-map: RT-VIDEO (match-any)
bandwidth 25% (373 kbps)

Class-map: CONTROL (match-any)
bandwidth 3% (44 kbps)

        Class-map: CRITICAL (match-any)
                   bandwidth 18% (268 kbps)
         Class-map: IMPORTANT (match-any)
                    bandwidth 16% (238 kbps)

        Class-map: class-default (match-any)

          shape (average) cir 4000000, bc 40000, be 0
          target shape rate 4000000

Vasilii Mikhailovskii · ‎01-29-2014

Hello.

Per my understanding, the issue is that each class under Policy-map QOS_OUT_NEW is not actually allocating what you are expecting.

For example,

class SITE_2

shape average 4000000 40000 0

Here you miss the infomation about bandwidth for class SITE_2, and, per calculation over all other classes (dividing bandwidth of physical interface), some other bandwidth is assigned to the class; let's say 1490K, priority is allocating 20%, that is equal to 298K.

Workaround could be to assign (cheat) to interface itself as much bandwidth as you can:

int G0/0

bandwidth 2000000

This would allow HQF to use shape as a determination for child policy.

PS:

1) you classes miss bandwidth, as a result they could affect each other unpredictivly;

2) under policy-map QOS_OUT_4MB class-default there is no use for shaper!

3) policy-change could affect production, so be careful and do changes during non-business hours.

4) don't you think, that 256 queue size is too much for 4M link? It would result in 256*(800 to 1200)/400,000 = 512 to 768 ms delay in queue; i guess it's better to allow WRED doing its job.

Richard Bradfield · ‎01-29-2014

Hi the bandwidth of the Interface is 40Mbps, i know its oversubscribed, but if i add up all the bandwidths to the remote sites it is 62Mbps

I have changed the bandwidth setting on the interface from 40000 to 60000, now

the RT-Voice class 20% says 452kbps better than the 298kbps I had before.

what I can't understand, the slow speed sites running 1Mbps.512kbps,384kbps and 192kbps, the percentages match the class correctly.The 4 Mbps and 2Mbps the values for the classes is the same !

Vasilii Mikhailovskii · ‎01-29-2014

Hello.

what I can't understand, the slow speed sites running 1Mbps.512kbps,384kbps and 192kbps, the percentages match the class correctly

As I wrote you, classes SITE_1, SITE_2 and etc. are allocating equal bandwidth, unless you specify it explicitly.

So, for 1M shaper, default calculation suits fine.

On physical interface set bandwidth equal to 200000.

ambikamani · ‎01-30-2014

Hello,

Having correct bandwidth configured( 40 MB) on the interface will result in correct Bandwidth percentage calculation for the classes of all the sites & help in getting adequate/ correct response for them.

Bandwidth % are calculated based on 40mb bandwidth and not on individual bandwidth of various sites .Individual bandwidth sum up is 63MB and observe 1.2 MB to be reserved for routing.So, remaining bandwidth is 63mb-1.2mb= 62mb should be for 24 sites but is 40 mb;

If real bandwidth got from Isp ( say site original bw is 62mb and not 40mb) then site 2 - 20% bandwidth for voice would be 452 kbps. But this site is having only 40 mb bandwidth received from ISP so, the 20% for voice shows correct value of 268kbps.

1 mb circuit, 512 kbps circuit are also calculated based on 40mb but 20% bandwidth calculation is appearing correct for us as the value calculated by us based on 512 kbps or 1 mb is very much nearer to the % caluclated using 40 mb bandwidth.

Changing the bandwidth on interface( 63 MB or any higher value) can show more bandwidth for the classes,but this willnot help in getting increased response for the applications defined under the classes, this is becuase ISP would have already configured the Policy on their end to allow only 40 MB for this circuit.

Configuring 62mb on interface could cause classes to allocate bandwidth based on 62mb. In our case 452 kbps would have been taken by site 2 voice class instead of the value it should really take( 268 kbps) & can provide 452-268 = addtional 184 kbps but could cause bandwidth shortage for other sites.

Bandwidth caln for 24 sites:

12*4=49152

5*2=10240

2048

1024

768

384=63616-1200=62416

Vasilii Mikhailovskii · ‎01-30-2014

Sorry, Ambika.

Having correct bandwidth configured( 40 MB) on the interface will result in correct Bandwidth percentage calculation for the classes of all the sites & help in getting adequate/ correct response for them.

Completly disagree with you.

You might have missed the idea how QoS policies working.

The ашкые most important thing to discuss in this case - is a bandwidth of the priority class[es], as it's setting minimum and maximum warranted bandwidth for voice traffic. If value is lower that expected (294K vs. 800K), then CIR won't be warranted and customer could suffer.

We do not care about bandwidth per classes CONTROL, CRITICAL and etc., as bandwidth values are accounted relatively to other classes, so Class1=100K/Class2=200K is equal to Class1=1000K,Class2=2000K. If relative wieghts are good (aka scheduling), then queueing becomes second important thing.

Per site shaping is done under classes inside parent policy applied to the interface and it has no general shaper, but shaper per class only. So we need to understand how queuing will work in case of congestion (more than 40M)!

Actually, the interface is Fe or Gi, so it can send with 100M or 1000M. As a result it will be sending unless peer does some sort of push-back (but providers are not used to do such things). If you exceed 40M and you (actually ISP) have QoS defined on PE, then you could expect that provider will be dropping packets per your policy, if not then simple tail-drop will occur on some "stupid" L1/L2 device.

Queueing per site is done by shaper, but it's only makes us sure that remote (site's) link won't be overutilized.

So, going back to our discussing. Having bandwidth 40M on the physical interface has no use, as it won't affect overall bandwidth that could be used. Maximum allocation is defined by summ of shapers' CIR.

If to be polite, I'm not sure, how your LLQ works, as your design is not common. My concern is: priority queue from SITE_1 could compete with other classes of SITE_2 - in this case jitter will occur.

Per my understanding, you should be running policy like this:

policy-map WAN_OUT

class class-default

shape aver 37000000 370000 0

service-policy QOS_OUT

policy-map QOS_OUT

class RT-VOICE

priority percent 20

police ... ! would recommend to police here, as ISPs do not like voice traffic to burst over contracted bandwidth and hard-drop it.

class RT-VIDEO

bandwidth percent 25

class CONTROL

bandwidth percent 3

class CRITICAL

bandwidth percent 18

random-detect dscp-based

class IMPORTANT

bandwidth percent 16

random-detect dscp-based

class TRANSACTIONAL

bandwidth percent 14

random-detect dscp-based

class class-default

fair-queue

PS: if your ISP sells you 40M, you'ld better ask them how do they count it... it could be "bits on the wire" or L3 bytes count; 1M could be 2^20 or 1000000. ISPs used to sell bits on the wire (so per packet you need to add 18 bytes for L2 header), so for 40M I would configure shape average 37000000.

Richard Bradfield · ‎01-30-2014

Mikhailovsky,

What you suggest is pretty much what I had originally setup

with a service policy SITES under the class-default

as below

Policy Map SITES

Class SITE_1

Average Rate Traffic Shaping

cir 4000000 (bps) bc 40000 (bits) be 0 (bits)

Class SITE_2

Average Rate Traffic Shaping

cir 4000000 (bps) bc 40000 (bits) be 0 (bits)

Class SITE_3

Average Rate Traffic Shaping

cir 2000000 (bps) bc 20000 (bits) be 0 (bits)

etc--etc

But as our ISP pointed out we were bursting to much of the classified traffic so at the ISP edge( using Foundry devices) they were dropping more classified traffic than the default traffic for example we could be trying send 10Mbps of AF31 traffic to one site, but as the remote site is only 4Mbps, a lot gets dropped. So they suggested putting the classified traffic under each site, so then will not burst above the CIR for each site.

So it looks like there are 2 trains of thoughts here as to which method is best!!

Vasilii Mikhailovskii · ‎02-04-2014

Hello.

per my understanding, Hub can't send 10M of AF3 when spoke has only 4M WAN link. TCP should discover maximum available bandwidth by packet retransmit.

Your current configuration is not a best practice (running shaper per site), as:

it's not a scalable solution - let's imaging you have 100 sites and sometimes you need to add new IP-ranges to some of them!
QoS (in providers cloud) makes since when it's implemented on inbound direction, as it allows to manage congestion on ISP's equipment; in you schema if Hub send traffic to spoke1 and any other spoke(2) is sending traffic to the same spoke1, then your shaper (on the Hub) won't help you!

So, per my understanding, if your provider recommended you to shape traffic on the Hub, they faced some issue

trying to provide you good service.

Not sure, what QoS model does your provider use, but you need to clarify how do they treat PE->CE traffic flows. It's common for voice queue to be hard-limited, but not common to drop traffic in any other queue unless congestion is detected.

Could you please clarify (ask your provider?) QoS model:

what classes does they provide you?
what bandwidth per class do they warrantee?
what are drop policies per class inbound (PE receiving from CE) and outbound (PE is sending to CE)?
do they remark traffic in cases then class any class is trying to consume more bandwidth then CIR?

PS: @Ambika, as I mentioned before, in case of congestion on class SITE1, data traffic could consume almost all the bandwidth making voice to suffer at 298K (while you expect 400K).

Richard Bradfield · ‎02-04-2014

MikhailovskyVV

My investigation all started when 1 site out of 24 was getting slow responses on a certain application. The factory could not print packing labels fast enough to keep up with production. On investigation I found about 25% of the packets to this one site were being remarked from AF31 to 0. I have been working with the ISP on this and they say they do NOT remark packets. They monitored ingress from my Head ensd site and all packets were AF31, they cannot monitor egress at the remote site, so today they are coming out to monitor their Cisco ME3400 router on my site to see if the packets get remarked there.

This what led me on to the Bandwidth anomalies

I take your point " Your current configuration is not a best practice (running shaper per site)" as my original config was pretty much as you suggested, It was the ISP themselves that recommended I change to per site shaping, they even sent me a sample configuration!

I'll keep you posted

ambikamani · ‎01-30-2014

Hi,

Below is my reply in addition to my earlier one; correct me if I am wrong.

They are all low by the same ratio, see below

Service-policy : QOS_OUT_4MB

queue stats for all priority classes:
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 10042611/742597306

Class-map: RT-VOICE (match-any)
Priority: 20% (298 kbps), burst bytes 7450, b/w exceed drops: 0------>298 kbps(20%)

Class-map: RT-VIDEO (match-any)
bandwidth 25% (373 kbps)-------------------------------------------------------------->373 kbps(25%)

Class-map: CONTROL (match-any)
bandwidth 3% (44 kbps)------------------------------------------------------------------->44kbps(3%)

Class-map: CRITICAL (match-any)
bandwidth 18% (268 kbps)---------------------------------------------------------------->268 kbps(18%)
Class-map: IMPORTANT (match-any)
bandwidth 16% (238 kbps)---------------------------------------------------------------->238 kbps(16%)

( Overall total- 20+25+3+18+16=82%) & sum of bandwidth for the above are equal to 1.2 MB
( 298+373+44+268+238=1.2MB). The one left out is default class and percentage leftout is 18% which is 720 kbps.

Overall bandwidth offered in this router for this site is 1.2 MB+720 kbps= 1.92 mb for this site & this adds that is based on 40 mb bandwidth calculation.

Note:

Bandwidth can also try to exceed defined 1.92 mb as we have not defined shaping limit as 2mb and is set to 4mb. This can be the possible cause for voice showing 268 kbps and not 452kbps.

Please correct me If I am wrong..

Regards,

Ambi.M

Vishesh Verma · ‎02-04-2014

Hi,

Right now QoS is calculating B/w percentages based on the 1.49 Mbps.

Can you try following and see if that changes it, it is probably how HQF chooses refrence bandwidth.

HWPolicy-map QOS_OUT_NEW

class ROUTING

bandwidth 1200

class SITE_DR

bandwidth 15000

shape average 15000000 150000 0

service-policy QOS_OUT_9MB

class SITE_1

bandwidth 4000

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

class SITE_2

bandwidth 4000

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

class SITE_2

bandwidth 4000

shape average 4000000 40000 0

service-policy QOS_OUT_4MB

HQF divides the link's bandwidth equally to all classes of parent policy, If no bandwidth command is configured.

For Example -

If Interface is 8 mpbs, and we have applied a qos policy with 4 classes, it is going to chose lowest of shaping rate or B/W allocated 2 mbps each if no b/w staement is defined in the class.

Class-1 B/W - 2 Shaping rate - 1 Reference B/W for calc - 1

Class-2 B/W - 2 Shaping rate - 1.5 Reference B/W for calc - 1.5

Class-3 B/W - 2 Shaping rate - 3 Refernce B/W for calc - 2

Class-4 B/W - 2 Shaping rate - 2.5 Refernce B/W for calc - 2

However if you you configure B/W statement in the class it is going to choose that as the reference B/W. Best practice is to put B/W equal to shaping rate in the class.

Hope it helps....If it still doesn't solve your problem, I would need following outputs to check further.

Complete qos config
show run interface on which qos is applied
show version
show interface on which qos is applied

-Vishesh

Richard Bradfield · ‎02-04-2014

Vishesh,

putting the bandwidth statement in fixed the problem, I am now seeing the correct bandwidth relative to each site

this has been bugging me for a couple of weeks

thank you

ambikamani · ‎02-05-2014

Hi Verma,

Saw your comments( answers) and of self interest I am asking this. Below is how I calculated the BW used by the below class of this QOS.

But 720 kbps ( remaining 18%) is wrong as it was calculated keeping 4MB in mind ( my bad) but 18% of Bandwidth alloted for the class has to be 268.2 kbps & class critical 18% bandwidth already shows 268.2 kbps.

Thought of correcting but left. I could see that you have mentioned below comment which too shows same result 1.49 mb. Please advise whether the mode of calculation as below also relates to how you acheived calculating as 1.49 mbps.

Please correct me if I am anywhere wrong..

Right now QoS is calculating B/w percentages based on the 1.49 Mbps.

They are all low by the same ratio, see below

Service-policy : QOS_OUT_4MB

queue stats for all priority classes:
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 10042611/742597306

Class-map: RT-VOICE (match-any)
Priority: 20% (298 kbps), burst bytes 7450, b/w exceed drops: 0------>298 kbps(20%)

Class-map: RT-VIDEO (match-any)
bandwidth 25% (373 kbps)-------------------------------------------------------------->373 kbps(25%)

Class-map: CONTROL (match-any)
bandwidth 3% (44 kbps)------------------------------------------------------------------->44kbps(3%)

Class-map: CRITICAL (match-any)
bandwidth 18% (268 kbps)---------------------------------------------------------------->268 kbps(18%)
Class-map: IMPORTANT (match-any)
bandwidth 16% (238 kbps)---------------------------------------------------------------->238 kbps(16%)

( Overall total- 20+25+3+18+16=82%) & sum of bandwidth for the above are equal to 1.221 MB
( 298+373+44+268+238=1.221MB). The one left out is default class and percentage leftout is 18% which is 268 kbps.

Overall bandwidth offered in this router for this site is 1.221 MB+0.268= 1.49 mb for this site & this adds that is based on 40 mb bandwidth calculation.