- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
03-07-2011 01:43 PM - edited 12-18-2018 05:19 AM
Introduction
This document provides details on how QOS is implemented in the ASR9000 and how to interpret and troubleshoot qos related issues.
Core Issue
QOS is always a complex topic and with this article I'll try to describe the QOS architecture and provide some tips for troubleshooting.
Based on feedback on this document I'll keep enhancing it to document more things bsaed on that feedback.
The ASR9000 employs an end to end qos architecture throughout the whole system, what that means is that priority is propagated throughout the systems forwarding asics. This is done via backpressure between the different fowarding asics.
One very key aspect of the A9K's qos implementation is the concept of using VOQ's (virtual output queues). Each network processor, or in fact every 10G entity in the system is represented in the Fabric Interfacing ASIC (FIA) by a VOQ on each linecard.
That means in a fully loaded system with say 24 x 10G cards, each linecard having 8 NPU's and 4 FIA's, a total of 192 (24 times 8 slots) VOQ's are represented at each FIA of each linecard.
The VOQ's have 4 different priority levels: Priority 1, Priority 2, Default priority and multicast.
The different priority levels used are assigned on the packets fabric headers (internal headers) and can be set via QOS policy-maps (MQC; modular qos configuration).
When you define a policy-map and apply it to a (sub)interface, and in that policy map certain traffic is marked as priority level 1 or 2 the fabric headers will represent that also, so that this traffic is put in the higher priority queues of the forwarding asics as it traverses the FIA and fabric components.
If you dont apply any QOS configuration, all traffic is considered to be "default" in the fabric queues. In order to leverage the strength of the asr9000's asic priority levels, you will need to configure (ingress) QOS at the ports to apply the priority level desired.
In this example T0 and T1 are receiving a total of 16G of traffic destined for T0 on the egress linecard. For a 10G port that is obviously too much.
T0 will flow off some of the traffic, depending on the queue, eventually signaling it back to the ingress linecard. While T0 on the ingress linecard also has some traffic for T1 on the egress LC (green), this traffic is not affected and continues to be sent to the destination port.
Resolution
The ASR9000 has the ability of 4 levels of qos, a sample configuration and implemenation detail presented in this picture:
Policer having exceeddrops, not reaching configured rate
Set the Bc to CIR bps * (1 byte) / (8 bits) * 1.5 seconds
and
Be=2xBc
Default burst values are not optimal
Say you are allowing 1 pps, and then 1 second you don’t send anything, but the next second you want to send 2. in that second you’ll see an exceed, to visualize the problem.
Alternatively, Bc and Be can be configured in time units, e.g.:
policy-map OUT
class EF
police rate percent 25 burst 250 ms peak-burst 500 ms
For viewing the Bc and Be applied in hardware, run the "show qos interface interface [input|output]".
Why do I see non-zero values for Queue(conform) and Queue(exceed) in show policy-map commands?
On the ASR9k, every HW queue has a configured CIR and PIR value. These correspond to the "guaranteed" bandwidth for the queue, and the "maximum" bandwidth (aka shape rate) for the queue.
In some cases the user-defined QoS policy does NOT explicitly use both of these. However, depending on the exact QoS config the queueing hardware may require some nonzero value for these fields. Here, the system will choose a default value for the queue CIR. The "conform" counter in show policy-map is the number of packets/bytes that were transmitted within this CIR value, and the "exceed" value is the number of packets/bytes that were transmitted within the PIR value.
Note that "exceed" in this case does NOT equate to a packet drop, but rather a packet that is above the CIR rate on that queue.
You could change this behavior by explicitly configuring a bandwidth and/or a shape rate on each queue, but in general it's just easier to recognize that these counters don't apply to your specific situation and ignore them.
What is counted in QOS policers and shapers?
When we define a shaper in a qos pmap, the shaper takes the L2 header into consideration.
The shape rate defined of say 1Mbps would mean that if I have no dot1q or qinq, I can technically send more IP traffic then having a QIQ which has more L2 overhead. When I define a bandwidth statement in a class, same applies, also L2 is taken into consideration.
When defining a policer, it looks at L2 also.
In Ingress, for both policer & shaper, we use the incoming packet size (including the L2 header).
In order to account the L2 header in ingress shaper case, we have to use a TM overhead accounting feature, that will only let us add overhead in 4 byte granularity, which can cause a little inaccuracy.
In egress, for both policer & shaper we use the outgoing packet size (including the L2 header).
ASR9K Policer implementation supports 64Kbps granularity. When a rate specified is not a multiple of 64Kbps the rate would be rounded down to the next lower 64Kbps rate.
For policing, shaping, BW command for ingress/egress direction the following fields are included in the accounting.
MAC DA |
MAC SA |
EtherType |
VLANs.. |
L3 headers/payload |
CRC |
Port level shaping
Shaping action requires a queue on which the shaping is applied. This queue must be created by a child level policy. Typically shaper is applied at parent or grandparent level, to allow for differentiation between traffic classes within the shaper. If there is a need to apply a flat port-level shaper, a child policy should be configured with 100% bandwidth explicitly allocated to class-default.
Understanding show policy-map counters
QOS counters and show interface drops:
Policer counts are directly against the (sub)interface and will get reported on the "show interface" drops count.
The drop counts you see are an aggregate of what the NP has dropped (in most cases) as well as policer drops.
Packets that get dropped before the policer is aware of them are not accounted for by the policy-map policer drops but may
show under the show interface drops and can be seen via the show controllers np count command.
Policy-map queue drops are not reported on the subinterface drop counts.
The reason for that is that subinterfaces may share queues with each other or the main interface and therefore we don’t
have subinterface granularity for queue related drops.
Counters come from the show policy-map interface command
Class name as per configuration | Class precedence6 | ||||||||
Statistics for this class | Classification statistics (packets/bytes) (rate - kbps) | ||||||||
Packets that were matched | Matched : 31583572/2021348608 764652 | ||||||||
packets that were sent to the wire | Transmitted : Un-determined | ||||||||
packets that were dropped for any reason in this class | Total Dropped : Un-determined | ||||||||
Policing stats | Policing statistics (packets/bytes) (rate - kbps) | ||||||||
Packets that were below the CIR rate | Policed(conform) : 31583572/2021348608 764652 | ||||||||
Packets that fell into the 2nd bucket above CIR but < PIR | Policed(exceed) : 0/0 0 | ||||||||
Packets that fell into the 3rd bucket above PIR | Policed(violate) : 0/0 0 | ||||||||
Total packets that the policer dropped | Policed and dropped : 0/0 | ||||||||
Statistics for Q'ing | Queueing statistics <<<---- | ||||||||
Internal unique queue reference | Queue ID : 136 | ||||||||
how many packets were q'd/held at max one time (value not supported by HW) |
High watermark (Unknown) | ||||||||
number of 512-byte particles which are currently waiting in the queue |
Inst-queue-len (packets) : 4096 | ||||||||
how many packets on average we have to buffer (value not supported by HW) |
Avg-queue-len (Unknown) | ||||||||
packets that could not be buffered because we held more then the max length |
Taildropped(packets/bytes) : 31581615/2021223360 | ||||||||
see description above (queue exceed section) | Queue(conform) : 31581358/2021206912 764652 | ||||||||
see description above (queue exceed section) | Queue(exceed) : 0/0 0 | ||||||||
Packets subject to Randon Early detection and were dropped. |
RED random drops(packets/bytes) : 0/0 |
Understanding the hardware qos output
RP/0/RSP0/CPU0:A9K-TOP#show qos interface g0/0/0/0 output
With this command the actual hardware programming can be verified of the qos policy on the interface
(not related to the output from the previous example above)
Tue Mar 8 16:46:21.167 UTC
Interface: GigabitEthernet0_0_0_0 output
Bandwidth configured: 1000000 kbps Bandwidth programed: 1000000
ANCP user configured: 0 kbps ANCP programed in HW: 0 kbps
Port Shaper programed in HW: 0 kbps
Policy: Egress102 Total number of classes: 2
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: Qos-Group7
QueueID: 2 (Port Default)
Policer Profile: 31 (Single)
Conform: 100000 kbps (10 percent) Burst: 1248460 bytes (0 Default)
Child Policer Conform: TX
Child Policer Exceed: DROP
Child Policer Violate: DROP
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: class-default
QueueID: 2 (Port Default)
----------------------------------------------------------------------
Default Marking behavior of the ASR9000
If you don't configure any service policies for QOS, the ASR9000 will set an internal cos value based on the IP Precedence, 802.1 Priority field or the mpls EXP bits.
Depending on the routing or switching scenario, this internal cos value will be used to do potential marking on newly imposed headers on egress.
Scenario 1
Scenario 2
Scenario 3
Scenario 4
Scenario 5
Scenario 6
Special consideration:
If the node is L3 forwarding, then there is no L2 CoS propagation or preservation as the L2 domain stops at the incoming interface and restarts at the outgoing interface.
Default marking PHB on L3 retains no L2 CoS information even if the incoming interface happened to be an 802.1q or 802.1ad/q-in-q sub interface.
CoS may appear to be propagated, if the corresponding L3 field (prec/dscp) used for default marking matches the incoming CoS value and so, is used as is for imposed L2 headers at egress.
If the node is L2 switching, then the incoming L2 header will be preserved unless the node has ingress or egress rewrites configured on the EFPs.
If an L2 rewrite results in new header imposition, then the default marking derived from the 3-bit PCP (as specified in 802.1p) on the incoming EFP is used to mark the new headers.
An exception to the above is that the DEI bit value from incoming 802.1ad / 802.1ah headers is propagated to imposed or topmost 802.1ad / 802.1ah headers for both L3 and L2 forwarding;
Related Information
ASR9000 Quality of Service configuration guide
Xander Thuijs, CCIE #6775
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi Xander!
I am currently implementing QoS on our ASR9k's (9010/RSP440, XR5.3.3, A9K-24X10GE-TR = 8 queues per port).
When I try to apply out QoS-Config I am getting this error-message:
!!% Given combination of p1, p2, p3, ..., pn queues are not supported at leaf-level of a queuing hierarchy: InPlace Modify Error: Policy BACKBONE_OUT: 'qos-ea' detected the 'warning' condition 'Given combination of p1, p2, p3, ..., pn queues are not supported at leaf-level of a queuing hierarchy'
Here is my config I wanted to commit (works on ASR920, our access devices - lightly modified in PQ and shape average)
policy-map BACKBONE_OUT_CHILD
class QOS_GROUP_7
bandwidth percent 1
class QOS_GROUP_6
bandwidth percent 1
class QOS_GROUP_5
priority level 1
police rate percent 5
class QOS_GROUP_4
bandwidth percent 1
class QOS_GROUP_3
bandwidth percent 30
class QOS_GROUP_2
bandwidth percent 40
class QOS_GROUP_1
bandwidth percent 20
class class-default
bandwidth percent 1
policy-map BACKBONE_OUT
class class-default
shape average 9500 Mbps
service-policy BACKBONE_OUT_CHILD
interface Bundle-Ether2301 (same on physical interface Te0/0/0/0)
service-policy output BACKBONE_OUT
I also tried to apply the BOCKBONE_OUT on the interface with minimal BACKBONE_OUT_CHILD class of 1. Afterwards I added class by class to see how many are accepted (inplace modify). We can add up to 7 queues (with valid queue-id) until the error is coming up.
Xander, can you tell me why we are getting this error message? What are we doing wrong? Or is it just as designed; is there one queue we have to think of in the parent shaper?
Thanks in advance!
Thomas
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Dear Alexander,
Very good information
1. I have some questions.
For VoQ with control plan traffic such as BFD protocol , OSPF protocol
in this case how to manage it.
2. It did help a lot in clarifying my confusions.
Ola, traffic that the RP or LC is injecting is directly enq'd to the port and no acl or qos is needed/available for such packets (this is new behavior in 4.x, in 39x you were able to apply qos for locally originated packets)
Thanks,
Manaschai.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Any possibility to match a specific TCP source port, encapsulated in PPPoE in Q in Q and passing through (L2) a ASR9k?
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
you are able to compute FAT labels based on IP when it pertains to PPPoE/PPP headers!
this is to be done on the PE’s that carry the ppp traffic.
intermediate nodes, with their standard inner label hashing, will pick up this ip based derived label for distribution!!
cheers!
xander
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@xthuijs That's great, but in my use-case I'm neither using MPLS nor am I load-balancing. What I need is to match the layer 4 port encapsulated in pppoe in vlan in vlan. If this can't be done on the 9k, then I need to do it on the BNG (which is a box adjacent to the 9k but is actually a 1k) and set outgoing COS values based on that, so I can queue this on the 9k.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
sorry I misunderstood your ask! :)
ah ok if it is for only qos marking purposes then yeah you can’t match on L3/L4 info, but you do have access to the COS values.
the COS would be set ideally by BNG and that you can use or remark as needed on the L2 adj node.
xander
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Good afternoon
Sorry, English.
I am facing the same problem with link aggregate, 3x 10G interface is ignored the policy-map. Can anyone help?
Marcelo Macedo
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi @xthuijs, i wanted to ask you a particular case that we are working on.
We have a customer that has a L2 service that goes through an ASR 9k. The customer is sending packets with a Default COS and MPLS EXP=7, we see that traffic send to the ASR but once it goes out the packet has a EXP=0.
As i understand this would be the Scenario 3. My question arise if there is some way to avoid this behaviour, other than matching the EXP label to a COS, since the service is a L2 point-to-point we should deliver to the customer in site A the same packet that we received from site B. There is a way to honor the EXP label sent by the customer? If we tag with an extra vlan header the EXP would be rewritten?
Thanks for your time and your post!
Andres.-
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
exp to ipp etc exactly as you depect on the scenrios.
these are default scenarios, one can change it if needed.
lemme know what you need or are in need of and we’ll get it sorted for ya!
xthuijs at cisco is email.
xander
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi @xthuijs
I am trying to built a QoS policy that should work across a large number of sub-interfaces on the same main (bundle) interface. On each sub-interface a large number of different customers with different services are running. What I need is ha policer all EF traffic across all sub-interfaces to 10% and all CS4 traffic to 50%. All other traffic should be able to use up up 100% of the bandwidth. It is on an ASR9K running 7.1.3. I have come up with the following configuration where I only place the policy-maps on the main interface.
- Is that the right way to do it?
- I am using parent/child with a policer in the parent for the physical rate of the main interface and with child-conform-aware. But can the child-conform-aware feature be used egress and should it be used egress?
class-map match-any WHOLESALE-EF
match cos 5
match dscp ef
end-class-map
!
class-map match-any WHOLESALE-AF
match cos 4
match dscp cs4
end-class-map
!
!
policy-map WHOLESALE-CHILD-OUT
class WHOLESALE-EF
priority level 1
police rate percent 10
set dscp ef
set cos 5
!
class WHOLESALE-AF
police rate percent 50
set dscp cs4
set cos 4
!
class class-default
set dscp default
set cos 0
!
end-policy-map
!
!
policy-map WHOLESALE-CHILD-IN
class WHOLESALE-EF
priority level 1
police rate percent 10
set dscp ef
set mpls experimental imposition 5
!
class WHOLESALE-AF
police rate percent 50
set mpls experimental imposition 4
set dscp cs4
!
class class-default
set mpls experimental imposition 0
set dscp default
!
end-policy-map
!
!
policy-map WHOLESALE-100G-PARANT-IN
class class-default
service-policy WHOLESALE-CHILD-IN
police rate 100 g
child-conform-aware
!
!
policy-map WHOLESALE-100G-PARANT-OUT
class class-default
service-policy WHOLESALE-CHILD-OUT
police rate 100 g
child-conform-aware
bandwidth remaining percent 100
!
interface bundle-ether10
service-policy input WHOLESALE-100G-PARANT-IN
service-policy output WHOLESALE-100G-PARANT-OUT
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
you only have 1 classification pass, while you can do a port /vlan shaper and your subif with the 2 level pmap, in this model you would not be able to
limit the aggregate rate of a particular class cross several vlans.
in order to achieve that you’d need a class level policy on your main interface that all vlans inherit.
xander
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi @xthuijs
So just to be sure, you mean just to create a "flat" single level policy and place that on the main interface? Like this:
class-map match-any WHOLESALE-EF
match cos 5
match dscp ef
end-class-map
!
class-map match-any WHOLESALE-AF
match cos 4
match dscp cs4
end-class-map
!
!
policy-map WHOLESALE-OUT
class WHOLESALE-EF
priority level 1
police rate percent 10
set dscp ef
set cos 5
!
class WHOLESALE-AF
police rate percent 50
set dscp cs4
set cos 4
!
class class-default
set dscp default
set cos 0
!
end-policy-map
!
!
policy-map WHOLESALE-IN
class WHOLESALE-EF
priority level 1
police rate percent 10
set dscp ef
set mpls experimental imposition 5
!
class WHOLESALE-AF
police rate percent 50
set mpls experimental imposition 4
set dscp cs4
!
class class-default
set mpls experimental imposition 0
set dscp default
!
end-policy-map
!
!
interface bundle-ether10
service-policy input WHOLESALE-IN
service-policy output WHOLESALE-OUT
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Any chance there's an equivalent of this page for the NCS5500 platform? Would love to see the default marking behavior for all those scenarios for those devices as well.
- « Previous
- Next »