Solved: Figuring the QOS out...

puddingtech · ‎07-06-2016

Ok so i have ipv4 IPOE subs working more or less they get the IP from dhcp get access, but not that i'm trying to implement QOS i'm hitting an issue i am using a ASR9001 for our BNG.

i attempted to use a COA to set a speed

coa_w32.exe -n 172.16.0.1 -p 1700 -k Password -1 26,9,1,"qos-policy-out:add-class(sub,(class-default),shape(10000))" -2 1,"b4ae.2b3b.xxxx"

I get a COA Accepted, but when i check my policys it shows no inbound or outbound policy applied.

i then added a static policy into my config root like below
policy-map policy1
class class-default
service-policy policy1_child
end-policy-map

policy-map policy1_child
class class-default
police rate 5 mbps peak-rate 10 mbps
end-policy-map

and applied it using

coa_w32.exe -n 172.16.0.1 -p 1700 -k Password -1 26,9,1,"sub-qos-policy-in=policy1 shared-policy-instance spi_1" -2 1,"b4ae.2b3b.xxxx"

And that policy shows up, but then i ran into the issue that the 5mbps policy is somehow only testing at 3-4mbps... and also thats via a static profile what if i want to do a 20mbps user i'd have to create mor static policies...

Tried to use "shape" instead of police and i got an error/warning and couldn't commit saying my LC doesnt support inbound queueing (ASR9001, builtin 10G's)

Xander please helppppppppppp.

EDIT: Edit just for reference i know the issue on the QOS speeds being lower than they are supposed to be is the peak/bursts wrong... anyone have any standard practices that tend to work to provide a baseline to start with?

xthuijs · ‎07-14-2016

hi cchance,

the npu has 3 Traffic managers, 1 for inbound, 2 for output. each TM is 30G. when the interface loading is >30G the inbound traffic manager is disabled, which means that any Q'ing action (shape/bandwidth) in a policy can't be used. The 9001 has 2x10G per NPU already so with any MPA inserted you probably have exceeded the 30G bw already.

Inbound policers are always possible.

When youwant to use pqos (parameterized, as used in your first attempt with the add-class), the session preferably should have a policy already applied, then you can modify it with pqos.

this can be a dummy policy also. like a policy-map WHATEVER, class-class-default, queue-limit 10 packets. Then you can modify that via pqos.

Easiest is to have a policy defined in XR already. With that policy defined you can coa this or via access accept via the radius attribute:

cisco-avpair="ip:sub-qos-policy-out=PMAP_NAME"

this requires to have all services defined already, but that is much more scalable for the system to do it this way over individual pqos ones (two exact same speeds would create 2 unique pmaps for each subscriber, where as the method of defining a pmap and referencing it only creates one instance of the policy, but multiple stats entries, so you save quite a bit of resources that way).

As for not reaching the desired rate, wanted to suggest the same as Phil, the standard burst etc may not be accomodating for you/the configured speed. so we need to tweak the exceed and violate bursts a bit to accomodate for short bursts based on delay and av packet size.

cheers

xander

View solution in original post

Phillip Meyerson · ‎07-14-2016

for policer values start with the recommendations in the ASR9000/XR Understanding QOS document
"Policer having exceeddrops, not reaching configured rate......" it is towards the top of the document.

puddingtech · ‎07-14-2016

thx will give that doc a look and try those vals, any idea on the coa/from radius setting of qos :D

xthuijs · ‎07-14-2016

hi cchance,

the npu has 3 Traffic managers, 1 for inbound, 2 for output. each TM is 30G. when the interface loading is >30G the inbound traffic manager is disabled, which means that any Q'ing action (shape/bandwidth) in a policy can't be used. The 9001 has 2x10G per NPU already so with any MPA inserted you probably have exceeded the 30G bw already.

Inbound policers are always possible.

When youwant to use pqos (parameterized, as used in your first attempt with the add-class), the session preferably should have a policy already applied, then you can modify it with pqos.

this can be a dummy policy also. like a policy-map WHATEVER, class-class-default, queue-limit 10 packets. Then you can modify that via pqos.

Easiest is to have a policy defined in XR already. With that policy defined you can coa this or via access accept via the radius attribute:

cisco-avpair="ip:sub-qos-policy-out=PMAP_NAME"

this requires to have all services defined already, but that is much more scalable for the system to do it this way over individual pqos ones (two exact same speeds would create 2 unique pmaps for each subscriber, where as the method of defining a pmap and referencing it only creates one instance of the policy, but multiple stats entries, so you save quite a bit of resources that way).

As for not reaching the desired rate, wanted to suggest the same as Phil, the standard burst etc may not be accomodating for you/the configured speed. so we need to tweak the exceed and violate bursts a bit to accomodate for short bursts based on delay and av packet size.

cheers

xander

puddingtech · ‎07-14-2016

Thanks for getting back to me, really appreciated.

Ok that first section on the TM disable on the 9001 kinda lost me because we're not using any MPA's its a stock 9001-S with just using 1x10G interface for testing on our testbench :S

Or is shape not supported on the internal ports.

Either way from what i've seen so far police should do fine.

Ah did not realize the manual sending 2mb/1mb would eat more queue processes than just sending the policy-out to a premade policy.

As for the desired rate, yep i'm already seeing better performance when expanding the bursts, you'd think there'd be some form of fancy auto-calc on the burst when no basic values are set in a perfect world

Will give your recommendations above a test and confirm once i get it working :) Thanks again

xthuijs · ‎07-14-2016

just checked the code, the TM is ingress hard disabled for the 9001. so what you're seeing is correct. aside, ingress shaping is so useless... ingress policing is muchmore sensible, especially in bng cases.

yeah as for the burst calculations... I had opted for a default that follows the logic that I had documented in the qos archi, but in all fairness, if one considers a high speed of say 50M which is not uncommon anymore in todays BNG, then the burst is insanely large also. the prbolem with burst is that there is no one size fits all, some guidance and recommendations exist, sw pre-determining will require an extensive algorithm. I decided to provide recommendations instead and let the sw defaults be what they are (for now :)

great to hear that everything starts to come together!

I did take note on your burst default calc thing, as I am putting together a use case for an enhanced algorithm. future will tell if I can pull it off :)

cheers!

xander

puddingtech · ‎07-15-2016

hey xavier i dunno what going on i created a basic policy-maps with

policy-map basic
class class-default
set dscp default

policy-map 10mb
class class-default
police rate 10 mbps burst 1 mbytes peak-rate 15 mbps peak-burst 1 mbytes

Applying basic via dynamic-template works and i see it applied, applying 10mb in dynamic-template causes the session to not connect (disconnected from iedge or something in logs)

If i don't put either in the qos template, customer connects but when i send the COA nothing happens it shows no policy attached to interface. I feel like i'm missing something stupidly simple.

apparently the above does work i wiped all the policy stuff out of config, and reentered the above using dynamic templates with basic preset, then set via COA the 10mb and it takes it now, not sure why it didnt work before.

xthuijs · ‎07-16-2016

hi chris, yeah hard to tell what didnt work earlier, but an option could be to do a debug qos ma to see what the control plane feels, vs the ea to let us know what the hardware installation tells us.

if you apply a qos policy via template it can be changed via coa or radius.

it would be best for performance however to either use radius OR template. reason is that when the template is activated, the interface for the sub is created with policy, if radius later on changes that during accept, we have to redo some work in the hardware to reprogram the policy.

cheers

xander

puddingtech · ‎06-29-2018

xander I've noticed an odd issue, I am provisioning this as we discussed i have the various throughputs set up as policy maps with police rates, and then i'm assigning them to the user via radius with avpairs.

Whats odd is if i provision as below with 10240/4096 when i run a speedtest.net throughput test to one of my servers it seems to be doubling the outbound policy throughput, in the below example the customer was getting a perfect 20mb/4mb throughput on the speed test like a nice smooth line of 20/4, which is above even the peak-rate.

Any idea what could be causing this, i've had reports of other customers on varying policys getting similar issues of higher than restricted bandwidth.

cisco-avpair=subscriber:sub-qos-policy-in=4096kb
cisco-avpair=subscriber:sub-qos-policy-out=10240kb

policy-map 10240kb
class class-default
police rate 10240 kbps burst 1920 kbytes peak-rate 15360 kbps peak-burst 3840 kbyte
and
policy-map 4096kb
class class-default
police rate 4096 kbps burst 768 kbytes peak-rate 6144 kbps peak-burst 1536 kbytes

smilstea · ‎07-01-2018

Are you using a bundle with 2 interfaces in the bundle? That would explain why its double as the policy is copied to both members. You can try changing the load balancing method on the 9K if its outbound of the 9k having the issue, or if its coming to the 9k across multiple links then the upstream needs to ensure traffic for a single sub only comes on one link.

Sam

puddingtech · ‎07-02-2018

Well it's setup as HA, so MC-LAG to a remote MC-LAG pair of switches (1 port per switch, per asr, so 4 ports total)

It's in the default load balancing mode...

Load balancing:
Link order signaling: Not configured
Hash type: Default
Te0/0/2/0 Local Active 0x0015, 0x9001 10000000 Link is Active
Te0/0/2/1 Local Active 0x0015, 0x9002 10000000 Link is Active
Te0/0/2/0 10.255.244.2 Standby 0x0016, 0xa001 10000000 Link is marked as Standby by mLACP peer
Te0/0/2/1 10.255.244.2 Standby 0x0016, 0xa002 10000000 Link is marked as Standby by mLACP peer

From Client upload to ASR seems to be balanced/limited correctly as they get exactly what they are supposed to get. But from ASR to Client (download speed). They are getting 2x the throughput, as you said thats most likely because they're going back to the client over TE0/0/2/0 and TE0/0/2/1 and each one is limited to 10mbit.

What would you recommend i change to work around the issue... should i just set a "bundle load-balancing hash dst-ip" on the bundle facing the access side to clients?

smilstea · ‎07-03-2018

Correct bundle load-balancing hash dst-ip would be the way to work around this.