cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
88438
Views
39
Helpful
240
Comments
xthuijs
Cisco Employee
Cisco Employee

Introduction

This document provides details on how QOS is implemented in the ASR9000 and how to interpret and troubleshoot qos related issues.

 

Core Issue

QOS is always a complex topic and with this article I'll try to describe the QOS architecture and provide some tips for troubleshooting.

Based on feedback on this document I'll keep enhancing it to document more things bsaed on that feedback.

 

The ASR9000 employs an end to end qos architecture throughout the whole system, what that means is that priority is propagated throughout the systems forwarding asics. This is done via backpressure between the different fowarding asics.

One very key aspect of the A9K's qos implementation is the concept of using VOQ's (virtual output queues). Each network processor, or in fact every 10G entity in the system is represented in the Fabric Interfacing ASIC (FIA) by a VOQ on each linecard.

That means in a fully loaded system with say 24 x 10G cards, each linecard having 8 NPU's and 4 FIA's, a total of 192 (24 times 8 slots) VOQ's are represented at each FIA of each linecard.

The VOQ's have 4 different priority levels: Priority 1, Priority 2, Default priority and multicast.

The different priority levels used are assigned on the packets fabric headers (internal headers) and can be set via QOS policy-maps (MQC; modular qos configuration).

When you define a policy-map and apply it to a (sub)interface, and in that policy map certain traffic is marked as priority level 1 or 2 the fabric headers will represent that also, so that this traffic is put in the higher priority queues of the forwarding asics as it traverses the FIA and fabric components.

If you dont apply any QOS configuration, all traffic is considered to be "default" in the fabric queues. In order to leverage the strength of the asr9000's asic priority levels, you will need to configure (ingress) QOS at the ports to apply the priority level desired.

qos-archi.jpg

In this example T0 and T1 are receiving a total of 16G of traffic destined for T0 on the egress linecard. For a 10G port that is obviously too much.

T0 will flow off some of the traffic, depending on the queue, eventually signaling it back to the ingress linecard. While T0 on the ingress linecard also has some traffic for T1 on the egress LC (green), this traffic is not affected and continues to be sent to the destination port.

Resolution

 

The ASR9000 has the ability of 4 levels of qos, a sample configuration and implemenation detail presented in this picture:

 

shared-policy.jpg

 

 

Policer having exceeddrops, not reaching configured rate

 

When defining policers at high(er) rates, make sure the committed burst and excess burst are set correctly.
This is the formula to follow:

Set the Bc to CIR bps * (1 byte) / (8 bits) * 1.5 seconds

and

Be=2xBc

Default burst values are not optimal

Say you are allowing 1 pps, and then 1 second you don’t send anything, but the next second you want to send 2. in that second you’ll see an exceed, to visualize the problem.

 

Alternatively, Bc and Be can be configured in time units, e.g.:

     policy-map OUT

      class EF

       police rate percent 25 burst 250 ms peak-burst 500 ms

 

For viewing the Bc and Be applied in hardware, run the "show qos interface interface [input|output]".

 

 

Why do I see non-zero values for Queue(conform) and Queue(exceed) in show policy-map commands?

On the ASR9k, every HW queue has a configured CIR and PIR value. These correspond to the "guaranteed" bandwidth for the queue, and the "maximum" bandwidth (aka shape rate) for the queue.

In some cases the user-defined QoS policy does NOT explicitly use both of these.  However, depending on the exact QoS config the queueing hardware may require some nonzero value for these fields.  Here, the system will choose a default value for the queue CIR.  The "conform" counter in show policy-map is the number of packets/bytes that were transmitted within this CIR value, and the "exceed" value is the number of packets/bytes that were transmitted within the PIR value.

Note that "exceed" in this case does NOT equate to a packet drop, but rather a packet that is above the CIR rate on that queue.

You could change this behavior by explicitly configuring a bandwidth and/or a shape rate on each queue, but in general it's just easier to recognize that these counters don't apply to your specific situation and ignore them.

 

What is counted in QOS policers and shapers?

 

When we define a shaper in a qos pmap, the shaper takes the L2 header into consideration.

The shape rate defined of say 1Mbps would mean that if I have no dot1q or qinq, I can technically send more IP traffic then having a QIQ which has more L2 overhead. When I define a bandwidth statement in a class, same applies, also L2 is taken into consideration.

When defining a policer, it looks at L2 also.

In Ingress, for both policer & shaper, we use the incoming packet size (including the L2 header).

In order to account the L2 header in ingress shaper case, we have to use a TM overhead accounting feature, that will only let us add overhead in 4 byte granularity, which can cause a little inaccuracy.

In egress, for both policer & shaper we use the outgoing packet size (including the L2 header).

 

ASR9K Policer implementation supports 64Kbps granularity. When a rate specified is not a multiple of 64Kbps the rate would be rounded down to the next lower 64Kbps rate.

 

For policing, shaping, BW command for ingress/egress direction the following fields are included in the accounting.

 

MAC DA

MAC SA

EtherType

VLANs..

L3 headers/payload

CRC

 

Port level shaping

Shaping action requires a queue on which the shaping is applied. This queue must be created by a child level policy. Typically shaper is applied at parent or grandparent level, to allow for differentiation between traffic classes within the shaper. If there is a need to apply a flat port-level shaper, a child policy should be configured with 100% bandwidth explicitly allocated to class-default.

Understanding show policy-map counters

 

QOS counters and show interface drops:

 

Policer counts are directly against the (sub)interface and will get reported on the "show interface" drops count.
The drop counts you see are an aggregate of what the NP has dropped (in most cases) as well as policer drops.

 

Packets that get dropped before the policer is aware of them are not accounted for by the policy-map policer drops but may
show under the show interface drops and can be seen via the show controllers np count command.

 

Policy-map queue drops are not reported on the subinterface drop counts.
The reason for that is that subinterfaces may share queues with each other or the main interface and therefore we don’t
have subinterface granularity for queue related drops.

 

 

Counters come from the show policy-map interface command

 

 

Class name as per   configuration Class   precedence6
Statistics for this class   Classification statistics          (packets/bytes)     (rate - kbps)
Packets that were matched     Matched             :            31583572/2021348608           764652
packets that were sent to the wire     Transmitted         : Un-determined
packets that were dropped for any reason   in this class     Total Dropped       : Un-determined
Policing stats   Policing statistics                (packets/bytes)     (rate - kbps)
Packets that were below the CIR rate     Policed(conform)    :            31583572/2021348608           764652
Packets that fell into the 2nd bucket   above CIR but < PIR     Policed(exceed)     :                   0/0                    0
Packets that fell into the 3rd bucket   above PIR     Policed(violate)    :                   0/0                    0
Total packets that the policer dropped     Policed and dropped :                   0/0
Statistics for Q'ing   Queueing statistics  <<<----
Internal unique queue reference     Queue ID                             : 136

how many packets were q'd/held at max one   time

(value not supported by HW)

    High watermark  (Unknown)

number of 512-byte particles which are currently

waiting in the queue

    Inst-queue-len  (packets)            : 4096

how many packets on average we have to   buffer

(value not supported by HW)

    Avg-queue-len   (Unknown)

packets that could not be buffered   because we held

more then the max length

    Taildropped(packets/bytes)           : 31581615/2021223360
see description above (queue exceed section)     Queue(conform)      :            31581358/2021206912           764652
see description above (queue exceed section)     Queue(exceed)       :                   0/0                    0

Packets subject to Randon Early detection

and were dropped.

    RED random drops(packets/bytes)      : 0/0

 

 

Understanding the hardware qos output

 

RP/0/RSP0/CPU0:A9K-TOP#show qos interface g0/0/0/0 output

 

With this command the actual hardware programming can be verified of the qos policy on the interface

(not related to the output from the previous example above)


Tue Mar  8 16:46:21.167 UTC
Interface: GigabitEthernet0_0_0_0 output
Bandwidth configured: 1000000 kbps Bandwidth programed: 1000000
ANCP user configured: 0 kbps ANCP programed in HW: 0 kbps
Port Shaper programed in HW: 0 kbps
Policy: Egress102 Total number of classes: 2
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: Qos-Group7
QueueID: 2 (Port Default)
Policer Profile: 31 (Single)
Conform: 100000 kbps (10 percent) Burst: 1248460 bytes (0 Default)
Child Policer Conform: TX
Child Policer Exceed: DROP
Child Policer Violate: DROP
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: class-default
QueueID: 2 (Port Default)
----------------------------------------------------------------------

 

 

Default Marking behavior of the ASR9000

 

If you don't configure any service policies for QOS, the ASR9000 will set an internal cos value based on the IP Precedence, 802.1 Priority field or the mpls EXP bits.

Depending on the routing or switching scenario, this internal cos value will be used to do potential marking on newly imposed headers on egress.

 

Scenario 1

Slide1.JPG

Scenario 2

Slide2.JPG

 

Scenario 3

Slide3.JPG

 

Scenario 4

 

Slide4.JPG

 

Scenario 5

 

Slide5.JPG

 

Scenario 6

Slide6.JPG

 

Special consideration:

If the node is L3 forwarding, then there is no L2 CoS propagation or preservation as the L2 domain stops at the incoming interface and restarts at the outgoing interface.

Default marking PHB on L3 retains no L2 CoS information even if the incoming interface happened to be an 802.1q or 802.1ad/q-in-q sub interface.

CoS may appear to be propagated, if the corresponding L3 field (prec/dscp) used for default marking matches the incoming CoS value and so, is used as is for imposed L2 headers at egress.

 

If the node is L2 switching, then the incoming L2 header will be preserved unless the node has ingress or egress rewrites configured on the EFPs.
If an L2 rewrite results in new header imposition, then the default marking derived from the 3-bit PCP (as specified in 802.1p) on the incoming EFP is used to mark the new headers.

 

An exception to the above is that the DEI bit value from incoming 802.1ad / 802.1ah headers is propagated to imposed or topmost 802.1ad / 802.1ah headers for both L3 and L2 forwarding;

 

Related Information

ASR9000 Quality of Service configuration guide

 

Xander Thuijs, CCIE #6775

 

Comments
xthuijs
Cisco Employee
Cisco Employee

you have a rather sizeable burst, it may mean that we are using that to the policer advantage, but that automatically gives you a higher rate of traffic on the interface.

check the show policy-map int <intf> to see how to the policer is handling the traffic,

from that I think the conform rate is higher then what you expected.

you probably have some violate traffic that possibly is making it through (yellow bucket).

you can also check what the hw programmed in terms of precise rates and bursts via:

show qos int <intf> <direction> as there is some rounding in configured vs programmed values.

cheers!

xander

RP/0/RSP0/CPU0:b1#sho policy-map int gi0/7/1/16

GigabitEthernet0/7/1/16 input: Rate_Limit_600M

Class class-default
  Classification statistics          (packets/bytes)     (rate - kbps)
    Matched             :          3739231994/5066639316118        739365
    Transmitted         : N/A
    Total Dropped       :           657155538/879741585911         128299
  Policing statistics                (packets/bytes)     (rate - kbps)
    Policed(conform)    :          3082076456/4186897730207        611066
    Policed(exceed)     :           657155538/879741585911         128299
    Policed(violate)    :                   0/0                    0
    Policed and dropped :           657155538/879741585911       

RP/0/RSP0/CPU0:b1#sho qos interface gi0/7/1/16 input
Interface: GigabitEthernet0_7_1_16 input
Bandwidth configured: 1000000 kbps Bandwidth programed: 1000000 kbps
ANCP user configured: 0 kbps ANCP programed in HW: 0 kbps
Port Shaper programed in HW: 0 kbps
Policy: Rate_Limit_600M Total number of classes: 1
----------------------------------------------------------------------
Level: 0 Policy: Rate_Limit_600M Class: class-default
QueueID: 196674 (Port Default)
Policer Profile: 44 (Single)
Conform: 610240 kbps (610240000 bps) Burst: 1024000 bytes (1024000 bytes)
Child Policer Conform: TX
Child Policer Exceed: DROP
Child Policer Violate: DROP
----------------------------------------------------------------------

It should conform 610Mbps + burst (1Mbytes x 8) = 618Mbps, but why when show interface see 743Mbps?

Thank you very much.

John Cavanaugh
Level 1
Level 1

Xander,

I am in the same situation that Thiyagarajan described. Currently our four TenGigE Bundle-Ethernet member ports are spit between the two MPA cards on a single MOD80 card.

Attached is the current output from "show qos interface" for one of our sub-interfaces on the bundle. In this case, we are wanting to police the sub-interface at 50Mb/s. Should I expect that we have aggregate policing among the ports that are on a common NP (50Mb/s for each NP) or is it policing per interface (50Mb/s per physical port).

If I move all four members to a single MPA card, would we get aggregate policing for all member ports since it would be serviced by a single NP? Is this the case for both input/output policing? Is there any additional configuration that is necessary to accomplish that or does it happen automatically?

I appreciate your time and assistance.

Respectfully,

-- John

xthuijs
Cisco Employee
Cisco Employee

hey john, yeah even if the members are on the same LC, or even on the same NPU, the token buckets are not shared, meaning that each member will get the bundle policy rate configured, which would mean member times the provisioned bandwidth.

although technically if all members are on the same npu we could do something like a shared policy instance type approach, but having all members on the same npu or lc defeats a bit the redundancy aspect that bundle could provide.

so in order to ensure desired rate on bundle, one would divide the target rate by the number of members, or use a bandwidth/police percent type approach.

or hash all vlan traffic with the same hash, so that the whole vlan takes one member only preferred, but with the others as backup, so redundancy is warranted, but that requires some "Traffic engineering" to define proper hash values per vlan to make sure that 2 heavy vlans dont take the same member otherwise it can result in uneven spread.

cheers!

xander

T J
Level 1
Level 1

I am new to QoS. In the ASR I saw,

  police rate 1 gbps burst 3 mbytes peak-rate 1100 mbps peak-burst 4 mbytes

I am wondering how it is configured? I don't think it used the following calculation:

bc = CIR bps * (1 byte) / (8 bits) * 1.5 seconds

can you plz help?

xthuijs
Cisco Employee
Cisco Employee

hi tj,

the Tc is fixed in a9k qos, to 4msec default or 500usec in low burst mode.

the bc calculation formula you have there is something i came up with 10 years ago for dsl based circuits, the RTT of 1.5 was reasonable there in that time and also the cir was rather low. folliwng this formula for todays RTT's and speeds will get you a crazy large burst. It may be reasonable to assume a 300msec RTT tops.

the general gist is that the default burst size calc, which is 100msec of service rate is a bit too tight, sometimes and if you want to give a cir of X you'd see 80-90% of X in some cases as complete rate, hence the slightly larger burst size can accommodate for something.

one and other is a trade off also, as a larger burst requires a receiver to have deeper receive buffers, and that can be tricky especially at high rates!

cheers!

xander

T J
Level 1
Level 1

Appreciate for a quick reply. What you mean by Tc? I need to change the QoS now for that connection, so wondering should I consider 300 msec RTT rather than 1.5? Should I still use the formula? How to calculate Peak-rate? I believe peak-burst is 2 * burst ? But that also does not fit with the value we are using now. I am pretty confused.

Thanks

xthuijs
Cisco Employee
Cisco Employee

hi tj,

also pull cisco live session id 2904 from sanfran 2014 and sandiego 2015, it has some visuals and explanations on the various pieces when it comes to qos (in general and how it applies to a9k).

Tc is the token refresh rate. Example: if we have a configured rate of say 100.

I can refresh every second 100, or every 250msec 25. The frequency that I replenish tokens is that Tc. Why does it matter? in a 10 second interval I would expect to see an *average* of 10x100 is 1,000.

Now if I refresh tokens only every second, I could technically see *bursts* of at time 1second 100, 999msec silence, time is 2second another 100.

If I refresh ever 250 msec, I would send 25x4 over that one second span, that looks much smoother!

but what if the traffic pattern is busrty by nature, so at a specific interval I want to send 30, but i only have 25, then 5 would violate. I can choose to drop, mark or borrow from the future. so over 2 timestamps I would have 2x25=50. but i only replenish 25 each time (tc). so I borrow/burst 5 from the future so on the next time I dont have 25 to give, but only 20, on average it works out. The size of the burst is dependent on the spikiness of the traffic pattern. TCP works well with drops from policers adopting and adjusting to the available rate and bursts.

Now if we have a policer, the desired rate is CIR or committed rate.

now if we allow to burst, to what rate? shapers by default allow to burst to LINERATE! so if the speed of the circuit is 10G, we allow that to be sent out at that 10G rate. but maybe we want to limit it a bit to how much we want to burst, that is called peak burst rate, so we allow some extra traffic above contract, but not to linerate but to peak burst rate.

xander

James Jun
Level 1
Level 1

Hi Xander,

By default, does the standard tail-drop behavior on A9K differentiate drop priorities based on IP Precedence value that's set on a packet?

Let's consider this example:  let say I have TenGE port that's used to break out sub-interfaces (NNI to a layer-2 transport provider for example).  I'd like to allow a sub-interface to burst all the way up to 10 Gbps if capacity on this port is available, but prioritize 2 Gbps of it in the event of a port congestion.  

I'd like to avoid having to put up a nested parent/child policy-map on the main host interface (TenGigE0/0/0/0); it would be nice if I can simply attach an independent policy-map for each sub-interface.  The desired idea is that when host interface is congested, packets with lowest precedence would get dropped before higher precedence.   Can such setup be accomplished simply by doing this below?  When TenGigE0/0/0/0 host port gets congested, would packets marked with prec1 be prioritized first, where as prec0 will get taildropped first?

interface TenGigE0/0/0/0
! no special configuration on host interface
!
interface TenGigE0/0/0/0.100
  ipv4 address 10.1.10.1/30
  service-policy output 2G_CDR_burstable_to_10GE
!
policy-map 2G_CDR_burstable_to_10GE
class class-default
police rate 2 gbps
conform-action set precedence 1
violation-action set precedence 0
!
!
end-policy-map
!

Aleksandar Vidakovic
Cisco Employee
Cisco Employee

By default there is no differentiation within a configured class. You can use WRED to achieve this kind of differentiation within a class, with different thresholds per IP precedence. For better scaling in case of high number of sub-interfaces, use the same WRED profile on all sub-interfaces. This would imply that your QoS policy switches from policing to queuing, meaning this approach would effectively be limited only to -SE flavour of line cards.

If you want to stick to policers, try the conform aware policer (search cisco.com for "child-conform-aware 5.3.x asr9000"). This will not allow per precedence differentiation per se, but it allows the policer at child level to exceed the commit rate, while at the parent level we keep track of whether at child level traffic was within conform or exceed rate.

hope this helps,

/Aleksandar

marpina
Cisco Employee
Cisco Employee

hI all

regarding "shape average percent"


My customer is asking the following points:

1.- if  CLI “shape average percent” is supported under priority level2 ?

2.- When others classes the flows are empty,  using the above config, the video traffic class can reached 95% line rate of 1gbps ?

3.- Does it support, that sum of the BW percent is > 100%  and  video 95 %?


interface GigabitEthernet0/3/0/0.400
 bandwidth 10000
 service-policy output rpvm_ether_shape_pb24_out
 encapsulation dot1q 400
 
 
policy-map rpvm_ether_shape_pb24_out
 class class-default
  service-policy rpvm_ether_pb24_out
  shape average 2000000 bps
 !
 end-policy-map
 !
policy-map rpvm_ether_pb24_out
 class rpvm_mgmnt_routing
  set cos 2
  bandwidth percent 10
 !
 class rpvm_voice
  priority level 1
  police rate percent 20
  !
  set cos 5
 !
 class rpvm_video
  set cos 2
  priority level 2
  shape average percent 95       <<< here
 !
 class rpvm_DATA
  set cos 2
  bandwidth percent 30
 !
 class rpvm_business
  set cos 2
  bandwidth percent 80
 !
 class class-default
  set cos 1
  bandwidth percent 10

xthuijs
Cisco Employee
Cisco Employee

hi there,

you cant really shape a priority class, because the p1 is dequeued for as long as it is active, that is has packets in there, hence a policer is required to limit the number of paks entering the queue. so you police a PQ.

the parent shaper defines the circuit rate and a child percent will take from the parent shape rate.

ps I see you have a routing class, that is not needed, any locally originated packet is treated with priority (except for icmp).

cheers

xander

James Jun
Level 1
Level 1

Hey Xander,

I have another question:

On ASR9K platform, when configuring a queue (i.e. 'shape average xyz' on an output direction service-policy applied to interface), the default queue size is 100ms of configured service rate per documentation.  

What if, I don't configure any QoS (no service-policy) on an interface, how large is the default output buffer size for an interface that has no QoS configured?  I do see it buffering a little bit on bursty traffic before I start seeing it collect output drops.  Is it 100ms of the interface speed or smaller than that?

I'm wondering if applying a service-policy that consumes a queue is beneficial when stepping down from huge ingress interface and out to small egress interface.  For example, HundredGigE core facing interface, going out to subscriber GigE port;  I have no need in this case to prioritize/classify or rate-limit traffic, but more worried about microburst on bursty data transfers over the internet, and whether a default interface without any QoS configured will provide enough output buffer to ride out the burst.

Thanks!
James

xthuijs
Cisco Employee
Cisco Employee

hey james!

correct, 100ms of service rate (that is configured rate plus share of the left over unnassigned bw on the interface).

without a service policy and a traffic manager provided transmit ring, the TM still provides for a tx-ring queue of a few packets just to keep the serializer going steady.

when you have speed differences like going from 100G to 1G at the transport layer there is some sort of shaping or better put windowing going on to send chunks and while they are received at 100G speeds, the overall window time is 1G as the receiver on the 1G side won't ack any faster (assuming that the 1G connected device can run at 1G linerate, which is generally not the case as it may not have deep enough receive buffers or ack fast enough).

buffering is good for loss sensitive applications and allow for a little bit of accommodation when there is bursty nature.

however buffering does add to the delay which may adversely affect the overall throughput.

tcp for instance thrives a lot better when there is a drop exerted so it find the perfect window size based on the end to end delays and transmission speeds.

so for your scenario, if you step down on speed, there is a "natural" burstiness: eg the 100G guy can send it much faster then the 1G can drain it. You can choose to buffer that burst if it is short enough, or you can choose to drop (part of) that burst and effectively tell the sender to "back off" faster.

there is no right or wrong or good or bad here, what the right approach is really depends on the application (type) its ability to tolerate loss and delays and overall speed demands.

cheers!

xander

James Jun
Level 1
Level 1

Got it, thanks!

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Quick Links