03-07-2011 01:43 PM - edited 12-18-2018 05:19 AM
This document provides details on how QOS is implemented in the ASR9000 and how to interpret and troubleshoot qos related issues.
QOS is always a complex topic and with this article I'll try to describe the QOS architecture and provide some tips for troubleshooting.
Based on feedback on this document I'll keep enhancing it to document more things bsaed on that feedback.
The ASR9000 employs an end to end qos architecture throughout the whole system, what that means is that priority is propagated throughout the systems forwarding asics. This is done via backpressure between the different fowarding asics.
One very key aspect of the A9K's qos implementation is the concept of using VOQ's (virtual output queues). Each network processor, or in fact every 10G entity in the system is represented in the Fabric Interfacing ASIC (FIA) by a VOQ on each linecard.
That means in a fully loaded system with say 24 x 10G cards, each linecard having 8 NPU's and 4 FIA's, a total of 192 (24 times 8 slots) VOQ's are represented at each FIA of each linecard.
The VOQ's have 4 different priority levels: Priority 1, Priority 2, Default priority and multicast.
The different priority levels used are assigned on the packets fabric headers (internal headers) and can be set via QOS policy-maps (MQC; modular qos configuration).
When you define a policy-map and apply it to a (sub)interface, and in that policy map certain traffic is marked as priority level 1 or 2 the fabric headers will represent that also, so that this traffic is put in the higher priority queues of the forwarding asics as it traverses the FIA and fabric components.
If you dont apply any QOS configuration, all traffic is considered to be "default" in the fabric queues. In order to leverage the strength of the asr9000's asic priority levels, you will need to configure (ingress) QOS at the ports to apply the priority level desired.

In this example T0 and T1 are receiving a total of 16G of traffic destined for T0 on the egress linecard. For a 10G port that is obviously too much.
T0 will flow off some of the traffic, depending on the queue, eventually signaling it back to the ingress linecard. While T0 on the ingress linecard also has some traffic for T1 on the egress LC (green), this traffic is not affected and continues to be sent to the destination port.
The ASR9000 has the ability of 4 levels of qos, a sample configuration and implemenation detail presented in this picture:

Set the Bc to CIR bps * (1 byte) / (8 bits) * 1.5 seconds
and
Be=2xBc
Default burst values are not optimal
Say you are allowing 1 pps, and then 1 second you don’t send anything, but the next second you want to send 2. in that second you’ll see an exceed, to visualize the problem.
Alternatively, Bc and Be can be configured in time units, e.g.:
policy-map OUT
class EF
police rate percent 25 burst 250 ms peak-burst 500 ms
For viewing the Bc and Be applied in hardware, run the "show qos interface interface [input|output]".
On the ASR9k, every HW queue has a configured CIR and PIR value. These correspond to the "guaranteed" bandwidth for the queue, and the "maximum" bandwidth (aka shape rate) for the queue.
In some cases the user-defined QoS policy does NOT explicitly use both of these. However, depending on the exact QoS config the queueing hardware may require some nonzero value for these fields. Here, the system will choose a default value for the queue CIR. The "conform" counter in show policy-map is the number of packets/bytes that were transmitted within this CIR value, and the "exceed" value is the number of packets/bytes that were transmitted within the PIR value.
Note that "exceed" in this case does NOT equate to a packet drop, but rather a packet that is above the CIR rate on that queue.
You could change this behavior by explicitly configuring a bandwidth and/or a shape rate on each queue, but in general it's just easier to recognize that these counters don't apply to your specific situation and ignore them.
When we define a shaper in a qos pmap, the shaper takes the L2 header into consideration.
The shape rate defined of say 1Mbps would mean that if I have no dot1q or qinq, I can technically send more IP traffic then having a QIQ which has more L2 overhead. When I define a bandwidth statement in a class, same applies, also L2 is taken into consideration.
When defining a policer, it looks at L2 also.
In Ingress, for both policer & shaper, we use the incoming packet size (including the L2 header).
In order to account the L2 header in ingress shaper case, we have to use a TM overhead accounting feature, that will only let us add overhead in 4 byte granularity, which can cause a little inaccuracy.
In egress, for both policer & shaper we use the outgoing packet size (including the L2 header).
ASR9K Policer implementation supports 64Kbps granularity. When a rate specified is not a multiple of 64Kbps the rate would be rounded down to the next lower 64Kbps rate.
For policing, shaping, BW command for ingress/egress direction the following fields are included in the accounting.
|
MAC DA |
MAC SA |
EtherType |
VLANs.. |
L3 headers/payload |
CRC |
Shaping action requires a queue on which the shaping is applied. This queue must be created by a child level policy. Typically shaper is applied at parent or grandparent level, to allow for differentiation between traffic classes within the shaper. If there is a need to apply a flat port-level shaper, a child policy should be configured with 100% bandwidth explicitly allocated to class-default.
QOS counters and show interface drops:
Policer counts are directly against the (sub)interface and will get reported on the "show interface" drops count.
The drop counts you see are an aggregate of what the NP has dropped (in most cases) as well as policer drops.
Packets that get dropped before the policer is aware of them are not accounted for by the policy-map policer drops but may
show under the show interface drops and can be seen via the show controllers np count command.
Policy-map queue drops are not reported on the subinterface drop counts.
The reason for that is that subinterfaces may share queues with each other or the main interface and therefore we don’t
have subinterface granularity for queue related drops.
Counters come from the show policy-map interface command
| Class name as per configuration | Class precedence6 | ||||||||
| Statistics for this class | Classification statistics (packets/bytes) (rate - kbps) | ||||||||
| Packets that were matched | Matched : 31583572/2021348608 764652 | ||||||||
| packets that were sent to the wire | Transmitted : Un-determined | ||||||||
| packets that were dropped for any reason in this class | Total Dropped : Un-determined | ||||||||
| Policing stats | Policing statistics (packets/bytes) (rate - kbps) | ||||||||
| Packets that were below the CIR rate | Policed(conform) : 31583572/2021348608 764652 | ||||||||
| Packets that fell into the 2nd bucket above CIR but < PIR | Policed(exceed) : 0/0 0 | ||||||||
| Packets that fell into the 3rd bucket above PIR | Policed(violate) : 0/0 0 | ||||||||
| Total packets that the policer dropped | Policed and dropped : 0/0 | ||||||||
| Statistics for Q'ing | Queueing statistics <<<---- | ||||||||
| Internal unique queue reference | Queue ID : 136 | ||||||||
|
how many packets were q'd/held at max one time (value not supported by HW) |
High watermark (Unknown) | ||||||||
|
number of 512-byte particles which are currently waiting in the queue |
Inst-queue-len (packets) : 4096 | ||||||||
|
how many packets on average we have to buffer (value not supported by HW) |
Avg-queue-len (Unknown) | ||||||||
|
packets that could not be buffered because we held more then the max length |
Taildropped(packets/bytes) : 31581615/2021223360 | ||||||||
| see description above (queue exceed section) | Queue(conform) : 31581358/2021206912 764652 | ||||||||
| see description above (queue exceed section) | Queue(exceed) : 0/0 0 | ||||||||
|
Packets subject to Randon Early detection and were dropped. |
RED random drops(packets/bytes) : 0/0 | ||||||||
RP/0/RSP0/CPU0:A9K-TOP#show qos interface g0/0/0/0 output
With this command the actual hardware programming can be verified of the qos policy on the interface
(not related to the output from the previous example above)
Tue Mar 8 16:46:21.167 UTC
Interface: GigabitEthernet0_0_0_0 output
Bandwidth configured: 1000000 kbps Bandwidth programed: 1000000
ANCP user configured: 0 kbps ANCP programed in HW: 0 kbps
Port Shaper programed in HW: 0 kbps
Policy: Egress102 Total number of classes: 2
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: Qos-Group7
QueueID: 2 (Port Default)
Policer Profile: 31 (Single)
Conform: 100000 kbps (10 percent) Burst: 1248460 bytes (0 Default)
Child Policer Conform: TX
Child Policer Exceed: DROP
Child Policer Violate: DROP
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: class-default
QueueID: 2 (Port Default)
----------------------------------------------------------------------
If you don't configure any service policies for QOS, the ASR9000 will set an internal cos value based on the IP Precedence, 802.1 Priority field or the mpls EXP bits.
Depending on the routing or switching scenario, this internal cos value will be used to do potential marking on newly imposed headers on egress.
If the node is L3 forwarding, then there is no L2 CoS propagation or preservation as the L2 domain stops at the incoming interface and restarts at the outgoing interface.
Default marking PHB on L3 retains no L2 CoS information even if the incoming interface happened to be an 802.1q or 802.1ad/q-in-q sub interface.
CoS may appear to be propagated, if the corresponding L3 field (prec/dscp) used for default marking matches the incoming CoS value and so, is used as is for imposed L2 headers at egress.
If the node is L2 switching, then the incoming L2 header will be preserved unless the node has ingress or egress rewrites configured on the EFPs.
If an L2 rewrite results in new header imposition, then the default marking derived from the 3-bit PCP (as specified in 802.1p) on the incoming EFP is used to mark the new headers.
An exception to the above is that the DEI bit value from incoming 802.1ad / 802.1ah headers is propagated to imposed or topmost 802.1ad / 802.1ah headers for both L3 and L2 forwarding;
ASR9000 Quality of Service configuration guide
Xander Thuijs, CCIE #6775
Thank you for that distinction, Xander.
Just to be 100% clear, you're saying that the TR Typhoon-based line-cards have 1GE of buffering available per NPU? And all 1GE of buffering is available for use for a single port, if desired?
I understand that on something like the 24-Port 10GE line-cards, each 3x10GE port is mapped to a single NPU. If only one of the three ports was in use, would we be able to use the entire 1GE for that single port?
Thanks!
if multiple ports to the npu have Q'ing needs the system will apply a cap on a per interface bases to provide some fair sharing and prevent starvation by one heavy intf.
xander
Hi Xander,
Actually I’d like to clarify the nomenclature
Is it
4 VQIs in a VOQ?
Or
4 VOQs per VQI?
I keep on finding it written in both ways though what is consistent is max 4k VOQ per FIA which would indicate that it is 4 VQIs in a VOQ.
But as you said the backpressure is actually granular down to the pri1/pri2/default/mcast (VQI?) level
And does VQI stands for virtual queue identifier please?
Thanks a bunch
adam
hi adam! yeah it is used interchangably quite a bit. I discussed one and other here also.
virtual output q'ing generally refers to having a scheduler per destination in 2 stage forwarding systems.
since a9k added prio levels to it, one can look at the prio level per destination as a voq, or perceive the overall shaper as the voq with having instances per priority level.
I tend to prefer to reference the destination as the voq and the prio level as the vqi.
regards
xander
I see :)
Thank you very much Xander,
Whatever it’s called, most important is that the backpressure granularity is actually per priority level.
One think I wanted to ask slightly off topic is regarding the central arbiter
There are actually two arbiters one at each fabric/RE card
Are the access requests/grants from FIAs being relied via both Arbiters –so that both arbiters have complete state of what’s been requested and granted but only Arbiter in fabric with active RE is managing tokens?
And there’s a low level keep-alive between the fabrics.
Or how does it work please?
Thank you
adam
you can but it is somewhat unconventional :)
best way to do this is :
- create a named class with some matching criteria
-create a child policy with a named-class
- set a priority on that named class with a policer rate value
- create a parent shaper
- apply a shape rate onto the class default
- tie the child policy into the parent class default.
xander
Hi,
there is some difference between Cisco Live presentation (see bellow) if CRC si really counted or not. Please can you clarify?
(slide 144) : Note that all L2 headers are included, added to the payload and that packet size is depleting the token bucket (applies to shaping also). Only IFG and CRC are not accounted for.
Thaks
Michael Macek
hi michael,
ouch that is a typo (from my side), the slide should have read, only IFG and PA are not accounted for.
CRC/FCS *is* part of the l2 overhead calculation for shapers and policers (and L2 headers including vlans etc).
xander
xander, i know your probably busy and hate to hijack a thread, but if you get a sec can you take a look at my post regarding qos via radius i'm killing myself for a while but i'm stumped https://supportforums.cisco.com/discussion/13067406/figuring-qos-out
Xander,
What is the smallest possible Tc/Bc on ASR9K when shaping on egress? I have the following scenario:
ASR9K (typhoon 24x10GE-TR) ---- 10GE --- switch with shallow buffers -- 1GE -- user
The switch with shallow buffers that's feeding 1GE into the user is mad-dropping packets (has 512KB buffer) as the speed steps down from 10G to 1G.
To solve this, I've applied egress shaper on the sub-interface on ASR9K: shape average 1gbps burst 256 kbytes.
256KB of Bc equals to roughly ~2.04ms of Tc. After applying this shaper on the ASR9K, drops are no longer occurring on the downstream switch and user is able to achieve full line rate throughput on single tcp flow.
Is using a shaper on ASR9K with this small of Tc/burst size (2ms) recommended; are there any downsides or concerns?
hi towardex,
the Tc for a9k is fixed to 4msec by default. this is because there are 512k queues per npu (on the SE card) and to sustain updating them correctly, we are at that Tc fixed. so in the traditional relation between CIR/Bc doesn't exist in hardware shaping (in general).
this means that every burst provided is replenished every 4 msec. this is fine for most shape rates, but may generate spike bursts that lower end receivers may not have the buffers for at higher speeds.
you can put the card into low burst modes to 500 usec (or even less, still fixed) to minimize that spike. this affects some accuracy for wred at high queue scales per npu.
at any rate, the solution you applied by adding a shaper and reducing the burst is the right remediation for the scenario you have at hand. what the optimum value is, that is "trial and error" and mostly defined by your receivers burst reception capability.
cheers!
xander
Nice, thank you!
How to calculate the BC?
Set the Bc to CIR bps * (1 byte) / (8 bits) * 1.5 seconds
If CIR = 50 mbps, the BC = 50,000,000 * 0.125 * 1.5 = 9,375,000 bytes, is this correct?
Thank you very much.
That is correct. 1.5 seconds is a very long time. The queue limits in child policy will default to 100ms of visible BW of a class, which in total won't exceed 625 kB (==50Mbps*100ms/8). If you manually increase the queue-limits you may end up with unexpected behaviour. Please read the QoS section of the BRKSPG-2904 from Cisco Live Berlin 2016.
/Aleksandar
I apply QoS on global interface, like this:
interface GigabitEthernet0/7/1/16
mtu 1540
load-interval 30
l2transport
service-policy input Rate_Limit_600M
!policy-map Rate_Limit_600M
class class-default
police rate 610240000 bps burst 1024000 bytes
conform-action transmit
exceed-action drop
But why when show interface, the traffic rate doesn’t follow the QoS?
RP/0/RSP0/CPU0:b1#sho int gi0/7/1/16 | inc rat
30 second input rate 743660000 bits/sec, 68809 packets/sec
30 second output rate 0 bits/sec, 2 packets/sec
Thank you very much.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: