cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
89816
Views
39
Helpful
240
Comments
xthuijs
Cisco Employee
Cisco Employee

Introduction

This document provides details on how QOS is implemented in the ASR9000 and how to interpret and troubleshoot qos related issues.

 

Core Issue

QOS is always a complex topic and with this article I'll try to describe the QOS architecture and provide some tips for troubleshooting.

Based on feedback on this document I'll keep enhancing it to document more things bsaed on that feedback.

 

The ASR9000 employs an end to end qos architecture throughout the whole system, what that means is that priority is propagated throughout the systems forwarding asics. This is done via backpressure between the different fowarding asics.

One very key aspect of the A9K's qos implementation is the concept of using VOQ's (virtual output queues). Each network processor, or in fact every 10G entity in the system is represented in the Fabric Interfacing ASIC (FIA) by a VOQ on each linecard.

That means in a fully loaded system with say 24 x 10G cards, each linecard having 8 NPU's and 4 FIA's, a total of 192 (24 times 8 slots) VOQ's are represented at each FIA of each linecard.

The VOQ's have 4 different priority levels: Priority 1, Priority 2, Default priority and multicast.

The different priority levels used are assigned on the packets fabric headers (internal headers) and can be set via QOS policy-maps (MQC; modular qos configuration).

When you define a policy-map and apply it to a (sub)interface, and in that policy map certain traffic is marked as priority level 1 or 2 the fabric headers will represent that also, so that this traffic is put in the higher priority queues of the forwarding asics as it traverses the FIA and fabric components.

If you dont apply any QOS configuration, all traffic is considered to be "default" in the fabric queues. In order to leverage the strength of the asr9000's asic priority levels, you will need to configure (ingress) QOS at the ports to apply the priority level desired.

qos-archi.jpg

In this example T0 and T1 are receiving a total of 16G of traffic destined for T0 on the egress linecard. For a 10G port that is obviously too much.

T0 will flow off some of the traffic, depending on the queue, eventually signaling it back to the ingress linecard. While T0 on the ingress linecard also has some traffic for T1 on the egress LC (green), this traffic is not affected and continues to be sent to the destination port.

Resolution

 

The ASR9000 has the ability of 4 levels of qos, a sample configuration and implemenation detail presented in this picture:

 

shared-policy.jpg

 

 

Policer having exceeddrops, not reaching configured rate

 

When defining policers at high(er) rates, make sure the committed burst and excess burst are set correctly.
This is the formula to follow:

Set the Bc to CIR bps * (1 byte) / (8 bits) * 1.5 seconds

and

Be=2xBc

Default burst values are not optimal

Say you are allowing 1 pps, and then 1 second you don’t send anything, but the next second you want to send 2. in that second you’ll see an exceed, to visualize the problem.

 

Alternatively, Bc and Be can be configured in time units, e.g.:

     policy-map OUT

      class EF

       police rate percent 25 burst 250 ms peak-burst 500 ms

 

For viewing the Bc and Be applied in hardware, run the "show qos interface interface [input|output]".

 

 

Why do I see non-zero values for Queue(conform) and Queue(exceed) in show policy-map commands?

On the ASR9k, every HW queue has a configured CIR and PIR value. These correspond to the "guaranteed" bandwidth for the queue, and the "maximum" bandwidth (aka shape rate) for the queue.

In some cases the user-defined QoS policy does NOT explicitly use both of these.  However, depending on the exact QoS config the queueing hardware may require some nonzero value for these fields.  Here, the system will choose a default value for the queue CIR.  The "conform" counter in show policy-map is the number of packets/bytes that were transmitted within this CIR value, and the "exceed" value is the number of packets/bytes that were transmitted within the PIR value.

Note that "exceed" in this case does NOT equate to a packet drop, but rather a packet that is above the CIR rate on that queue.

You could change this behavior by explicitly configuring a bandwidth and/or a shape rate on each queue, but in general it's just easier to recognize that these counters don't apply to your specific situation and ignore them.

 

What is counted in QOS policers and shapers?

 

When we define a shaper in a qos pmap, the shaper takes the L2 header into consideration.

The shape rate defined of say 1Mbps would mean that if I have no dot1q or qinq, I can technically send more IP traffic then having a QIQ which has more L2 overhead. When I define a bandwidth statement in a class, same applies, also L2 is taken into consideration.

When defining a policer, it looks at L2 also.

In Ingress, for both policer & shaper, we use the incoming packet size (including the L2 header).

In order to account the L2 header in ingress shaper case, we have to use a TM overhead accounting feature, that will only let us add overhead in 4 byte granularity, which can cause a little inaccuracy.

In egress, for both policer & shaper we use the outgoing packet size (including the L2 header).

 

ASR9K Policer implementation supports 64Kbps granularity. When a rate specified is not a multiple of 64Kbps the rate would be rounded down to the next lower 64Kbps rate.

 

For policing, shaping, BW command for ingress/egress direction the following fields are included in the accounting.

 

MAC DA

MAC SA

EtherType

VLANs..

L3 headers/payload

CRC

 

Port level shaping

Shaping action requires a queue on which the shaping is applied. This queue must be created by a child level policy. Typically shaper is applied at parent or grandparent level, to allow for differentiation between traffic classes within the shaper. If there is a need to apply a flat port-level shaper, a child policy should be configured with 100% bandwidth explicitly allocated to class-default.

Understanding show policy-map counters

 

QOS counters and show interface drops:

 

Policer counts are directly against the (sub)interface and will get reported on the "show interface" drops count.
The drop counts you see are an aggregate of what the NP has dropped (in most cases) as well as policer drops.

 

Packets that get dropped before the policer is aware of them are not accounted for by the policy-map policer drops but may
show under the show interface drops and can be seen via the show controllers np count command.

 

Policy-map queue drops are not reported on the subinterface drop counts.
The reason for that is that subinterfaces may share queues with each other or the main interface and therefore we don’t
have subinterface granularity for queue related drops.

 

 

Counters come from the show policy-map interface command

 

 

Class name as per   configuration Class   precedence6
Statistics for this class   Classification statistics          (packets/bytes)     (rate - kbps)
Packets that were matched     Matched             :            31583572/2021348608           764652
packets that were sent to the wire     Transmitted         : Un-determined
packets that were dropped for any reason   in this class     Total Dropped       : Un-determined
Policing stats   Policing statistics                (packets/bytes)     (rate - kbps)
Packets that were below the CIR rate     Policed(conform)    :            31583572/2021348608           764652
Packets that fell into the 2nd bucket   above CIR but < PIR     Policed(exceed)     :                   0/0                    0
Packets that fell into the 3rd bucket   above PIR     Policed(violate)    :                   0/0                    0
Total packets that the policer dropped     Policed and dropped :                   0/0
Statistics for Q'ing   Queueing statistics  <<<----
Internal unique queue reference     Queue ID                             : 136

how many packets were q'd/held at max one   time

(value not supported by HW)

    High watermark  (Unknown)

number of 512-byte particles which are currently

waiting in the queue

    Inst-queue-len  (packets)            : 4096

how many packets on average we have to   buffer

(value not supported by HW)

    Avg-queue-len   (Unknown)

packets that could not be buffered   because we held

more then the max length

    Taildropped(packets/bytes)           : 31581615/2021223360
see description above (queue exceed section)     Queue(conform)      :            31581358/2021206912           764652
see description above (queue exceed section)     Queue(exceed)       :                   0/0                    0

Packets subject to Randon Early detection

and were dropped.

    RED random drops(packets/bytes)      : 0/0

 

 

Understanding the hardware qos output

 

RP/0/RSP0/CPU0:A9K-TOP#show qos interface g0/0/0/0 output

 

With this command the actual hardware programming can be verified of the qos policy on the interface

(not related to the output from the previous example above)


Tue Mar  8 16:46:21.167 UTC
Interface: GigabitEthernet0_0_0_0 output
Bandwidth configured: 1000000 kbps Bandwidth programed: 1000000
ANCP user configured: 0 kbps ANCP programed in HW: 0 kbps
Port Shaper programed in HW: 0 kbps
Policy: Egress102 Total number of classes: 2
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: Qos-Group7
QueueID: 2 (Port Default)
Policer Profile: 31 (Single)
Conform: 100000 kbps (10 percent) Burst: 1248460 bytes (0 Default)
Child Policer Conform: TX
Child Policer Exceed: DROP
Child Policer Violate: DROP
----------------------------------------------------------------------
Level: 0 Policy: Egress102 Class: class-default
QueueID: 2 (Port Default)
----------------------------------------------------------------------

 

 

Default Marking behavior of the ASR9000

 

If you don't configure any service policies for QOS, the ASR9000 will set an internal cos value based on the IP Precedence, 802.1 Priority field or the mpls EXP bits.

Depending on the routing or switching scenario, this internal cos value will be used to do potential marking on newly imposed headers on egress.

 

Scenario 1

Slide1.JPG

Scenario 2

Slide2.JPG

 

Scenario 3

Slide3.JPG

 

Scenario 4

 

Slide4.JPG

 

Scenario 5

 

Slide5.JPG

 

Scenario 6

Slide6.JPG

 

Special consideration:

If the node is L3 forwarding, then there is no L2 CoS propagation or preservation as the L2 domain stops at the incoming interface and restarts at the outgoing interface.

Default marking PHB on L3 retains no L2 CoS information even if the incoming interface happened to be an 802.1q or 802.1ad/q-in-q sub interface.

CoS may appear to be propagated, if the corresponding L3 field (prec/dscp) used for default marking matches the incoming CoS value and so, is used as is for imposed L2 headers at egress.

 

If the node is L2 switching, then the incoming L2 header will be preserved unless the node has ingress or egress rewrites configured on the EFPs.
If an L2 rewrite results in new header imposition, then the default marking derived from the 3-bit PCP (as specified in 802.1p) on the incoming EFP is used to mark the new headers.

 

An exception to the above is that the DEI bit value from incoming 802.1ad / 802.1ah headers is propagated to imposed or topmost 802.1ad / 802.1ah headers for both L3 and L2 forwarding;

 

Related Information

ASR9000 Quality of Service configuration guide

 

Xander Thuijs, CCIE #6775

 

Comments

Hi Xander,

What happen with the MCAST traffic in the ingress port? would it put automatic in VOQ Mcast level? can I put the Mcast in P2 with a ingress Policy in QoS?

If router receive QoS marks for 5,6,7 EXP BIT (unicast traffic), would it match level default, P1 or P2 ?

regards,

Mauricio

xthuijs
Cisco Employee
Cisco Employee

hi mauricio,

it would not be recommended to move mcast to a priority queue, since it is handled by the fabric already in a separate queue. the p1/p2/ucast/mcast voq's are not priority shaped as such but merely separated so that if there is flow off to happen there is separation between the categories. since we flow off ucast/default mainly, let's call it X, then moving type Z to type Y has no affect there really other then overloading type Y.

by default any traffic classification is not marked with priority. so having an EXP7 or a TOS/COS of 7 does not constitute a p1/p2 classification in the VOQ. 

the only way to achieve that is by having an ingress qos policy that takes that marking and sets the priority level accordingly.

cheers

xander

Adam Vitkovsky
Level 3
Level 3

Hi Xander,

 

In a theoretical case where mcast is very critical to the business, eg. stock market ticks to a trading company, in order to minimize delay, wouldn’t it help to tag it as P1 on ingress so that it is always served first on fabric arbitration please?

And also in case of congestion in the 10GE entity or the egress NPU as a whole, the P1 and P2 VOQs would be served first by the arbiter right?

Thank you

 

adam

 

netconsultings.com

::carrier-class solutions for the telecommunications industry::

 

xthuijs
Cisco Employee
Cisco Employee

adam, the mcast queue is only shaped for bandwidth so that the system internally is not excessive flooded with mcast. in terms of priority/latency, the end to end from np to np delay is 15 usec at linerate.

you CAN put it in a pq of the FIA, I just don't see it making any substantial difference in terms of priority handling on the fabric, or reducing the latency.

what seems to be more important here for your scenario is the fact that packets are not lost, which is already achived by the natural FIA queue separation and of course egress Q'ing as necesary, that is where some prioritization will possibly benefit.

xander

Thanks Xander!!

What happens if the FIA receibed a heavy DDoS attack? can the VOQ on all FIA affected flooded and congest 4 Q inclusive MCAST?

I had some issue with DDoS and Multicast traffic in 24x10 cards. When the attack was produce the Mcast traffic died, inclusive with QoS in all backbone core face (priority 2). You know if apply some policy in input interface (core face, and P1 or P2) can mitigate some issue in mcast traffic?

regards,

xthuijs
Cisco Employee
Cisco Employee

hi mauricio,

the mcast loss could also have been receivers not able to join or maintain the query possibly.

the queueset in the fia is shaped at about 12G or so, and when there is backpressure for that destination, typically only BE is flowed off. Mcast is shaped at 30% (if I am not mistaken) of the overall rate to ensure that not all traffic is mcast only or can contibute to the starvation.

managing DDOS would preferably require some netflow to categorize the pattern of the dos and using an acl or flowspec to redirect/drop or police that offending (set of) flow(s).

when the np starves for pps EFD will kick in to start dropping low priority on ingress.

the fIA shaping and HOLB prevention is mostly useful and effective when there are 3 interfaces say receiving 5G towards the same 10G destination interface.

each receiving interface will get flowed off in the fia towards that single destination.

the ddos you described more sounds like a 1:1 relation and the np was likely running out of resources (pps handles).

xander

Adam Vitkovsky
Level 3
Level 3

Hi Xander,

 

>the queueset in the fia is shaped at about 12G or so,

What do you mean by queue-set please? Do you mean VQI (4 VOQs)?

 

>and when there is backpressure for that destination, typically only BE is flowed off.

>Mcast is shaped at 30% (if I am not mistaken) of the overall rate

What do you mean by overall rate please? The above 12G of VQI? Or NPU rate?

 

Thank you very much

 

adam

 

netconsultings.com

::carrier-class solutions for the telecommunications industry::

 

xthuijs
Cisco Employee
Cisco Employee

see voq/vqi as a parent/child pmap relation.

voq is shaped at the 12G-ish rate and there is a voq per 10G entity in the system (typhoon).

there are 4 different priority levels in there, p1/p2/best effort/mcast.

the mcast queue has a shaper also limiting to a percentage of the parent shape rate to prevent all starvation by mcast.

xander

Adam Vitkovsky
Level 3
Level 3

Got it, thank you very much Xander,

 

I knew that arbiter has to allow slightly more to a 10GE entity, so that in case of overload traffic is not backpressured right away at the 10Gbps mark and then eventually dropped on ingress LC according to the coarse VOQ priorities, but instead a reasonable overflow is actually TX-ed over the Fabric, so that the egress NPU can decide based on the fine egress policies what to drop and what to keep in case of congestion.

However one has to strike the balance just right so that this “bad/overload” traffic is not wasting Fabric and/or NPU resources too much, after all this overflow traffic could be going to multiple 10GE entities on a given egress NPU, (hence just slightly over 10Gbps (~12Gbps) is allowed through), am I right in my logic?

 

Thank you

 

adam

 

netconsultings.com

::carrier-class solutions for the telecommunications industry::

 

smailmilak
Level 4
Level 4

Hi Xander,

If we have police in PQ class, let's say police rate 1 gbps, we will not loose that 1Gig even there is only a couple of megs in this class? The router will not cut of this 1G from other classes, only how much is needed at the moment, max 1Gig?

Adam Vitkovsky
Level 3
Level 3

No, standard QOS doesn’t work on the Maximum Allocation Model (MAM) bases, but rather on the Russian Dolls Model (RDM) bases.

In other words, that 1Gbps is just a peak limit that PQ has – if this 1Gbps is not used by PQ it can be used by any other class.

 

adam

 

netconsultings.com

::carrier-class solutions for the telecommunications industry::

 

smailmilak
Level 4
Level 4

Sounds great.

I see that I can configure MAM and RDM for MPLS-TE.

In case of standard QoS there is no need to explicitly tell to use RDM?

Adam Vitkovsky
Level 3
Level 3

Yes I think the RDM and MAM is defined for IntServ while DiffServ uses only sort of RDM.

 

Borrowing my replica from a previous post:

 

This is the fundamental difference between the Police, Shape and Bandwidth:

Police = allowed maximum for a given class all the times, whether there’s congestion or not

Shape = allowed maximum for a given class all the times, whether there’s congestion or not

Bandwith = guaranteed minimum for a given class in case there’s a congestion and different classes of traffic are fighting over available BW resources (active only during congestion)

 

 

adam

 

netconsultings.com

::carrier-class solutions for the telecommunications industry::

 

Hello Xander.

Thanks for a good write-up. Not sure if was updated (or even needs to be updated) in the light of newer IOS XR version being out.

Anyway, a quick question.

Default Marking behavior of the ASR9000 scenario 3, where we have an ingress untagged MPLS frame with EXP 3. Your chart shows that by default with label swap EXP bits are not honored and egress MPLS EXP is 0. Am I reading it wrong?

TIA

Alexei.

xthuijs
Cisco Employee
Cisco Employee

hi alexei, thank you! this default marking has not changed over time, so it still applies :) when you have an untagged frame coming in on bridging, the internal COS is 0. the internal cos value is what gets applied to imposed headers on egress, that includes any added labels, or vlans.

if you are doing label swapping, that is routed interfaces, you are likely in scenario 6 instead.

in that case you'll honor/trust the exp to set the internal cos and that would transport over into the swap label egress.

this is to support pipe mode by default, that was the train of thought behind these default marking operations.

cheers

xander

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Quick Links