Solved: When does QOS "kick in" on a router/switch

carl_townshend · ‎01-17-2024

Hi Guys

hoping someone can clarify as there is multiple different answers out there.

As we know, by default an interface only has a hardware queue which is FIFO.

When we create a qos policy with CBWFQ / LLQ etc, this then creates software queues which traffic goes through before it hits the hardware queue i.e tx ring.

So, once I enable QOS, does the traffic always go into these queues even if the hardware queue is not full, or does it only go to the queues once the hardware queue is full?

I could do with some clarification

Ramblin Tech · ‎01-17-2024

Short answer: QoS is always kicked-in, even when there are no QoS policies explicitly configured. I'll come back to this...

For purposes of my post, I will define "QoS" as the classification, marking, rate limiting (shaping, policing), queueing, and scheduling of packets for the purpose of supporting differentiated treatment during both the local forwarding process and network-wide forwarding. The actual QoS implementation is entirely dependent on the particular switch/router platform, though IOS/XE/XR/NXOS do share a common method for configuring QoS operation on their underlying platforms: Cisco MQC (Modular QoS CLI).

By platform dependence, I mean a router that forwards entirely in software via its CPU can have very different QoS behaviors than routers/switches that forward using an NPU (Network Processing Unit). Software-based forwarding can be the most feature-rich & flexible, and have the fewest platform-dependent restrictions because it only depends on available CPU cores and RAM; but this richness comes at the expense of forwarding performance. By contrast, NPU-based routers/switches can have [much] higher performance, but with restrictions that are inherent in the NPU hardware. For example, Cisco implements elements with both custom silicon NPUs (eg, ASR1000 family with QFP NPUs, Cisco 8000 router family with SiOne NPUs) and merchant silicon NPUs (NCS540/5500/5700 with Broadcom DNX NPUs, various Nexus products with Broadcom StrataXGS NPUs) with each having different QoS capabilities

Each family of NPU comes with quirks and tradeoffs in their respective QoS-handling circuitry due to conflicting priorities of throughput, cost, power consumption, scale, etc. This leads to different properties for different NPUs when it comes to the hardware support for items such as the number of queues per interface, how the queues can be schedules (SP vs WRR vs WFQ vs ...), how many WRED thresholds are supported per queue, what fields can a packet be classified on and how deep into the packet can the classifier look, what fields can be [re]marked at ingress and egress, what meta-data can be associated with the packet in an ingress QoS policy for use later with an egress QoS policy, how many levels of hierarchical QoS are supported, etc. Software-based elements can just have more code written to expand QoS capabilities to cover nearly endless possibilities, but NPU-based elements have their limitations baked-in when the ASIC is designed. A concrete example of this is support for egress policing, something configurable in MQC; support for egress policing is common in Cisco custom silicon which are designed with MQC capabilities in mind, but not supported by Broadcom's merchant silicon DNX family.

Anyway, moving back toward your question... A common Cisco model of QoS, supported by MQC, is that received packets are written into dedicated packet buffer memory. After which, the received headers are processed by a packet forwarding engine which applies QoS policies, security policies (eg, ACLs) and forwards to the egress interface. Applying QoS is an integral part of the the forwarding process, not an afterthought. The QoS processing of the ingress packet headers can involve

classification into traffic classes based on a variety of criteria
associating metadata such as "qos-group" with the classification
rate limiting such as ingress policing can be applied to the classified packets with marking of metadata to indicate conformance with the rate limiting.

After applying the ingress QoS (and security) policies, the forwarding engine looks up the next-hop in the FIB and "switches" the packet to the egress interface. Nowadays, it is quite common for the switching to be virtual (rather than actually copying the packet from an ingress queue to an egress queue) through the use of Virtual Output Queues (VOQs). That is, the packet stays where it is in the packet buffer, with the packet linked to a queue for an egress interface. Which egress queue? Well that depends on the egress QoS policy. After switching the packet, the forwarding engine applies the egress QoS (and security) policies which can include:

Using the ingress classification and metadata to link the packet to a configured queue, possibly rewriting fields in the packet
Actively manage the queues with WRED
De-queue and schedule the transmission of packets out of the interface if there is more than one queue based on configured criteria (SP vs weighted whatever), while taking into account any configured rate limiting

So now, back to your question: with NPU-based routers/switches, packets are written into dedicated, high-speed packet buffers attached to the NPUs. Transit traffic is never copied to a "software queue" that is handled by the CPU (it is a different story for non-transit traffic punted out of the NPU and up to the CPU for local processing). These queues for transit traffic are h/w-based, not s/w, and are handled by the NPU. XE routers that use CPU forwarding (eg, the ISR line), may very well still use tx-ring artifacts from the IOS days when all forwarding was done by Freescale or Motorola 68000 CPUs (I have not looked closely at ISRs in many years), but NPU-based products utilize h/w capabilities for QoS, ACLs, and forwarding. Virtualized versions of products based on NPUs (eg, CSR1Kv, ASR9Kv) also tend to model their forwarding on their h/w cousins.

So what did I mean when I said "QoS is always kicked-in"? When there are no explicit MQC policies configured, there is still a default queue and a default scheduling scheme, which is essentially FIFO. Without an MQC policy, all traffic gets classified into what is essentially class-default, which then gets queued to what is essentially the class-default queue, which then gets scheduled FIFO. With MQC policies, traffic is treated according to its classification, with unclassified traffic falling through into to the class-default bucket. With NPU-based elements QoS/ACL/forwarding is via h/w, regardless of whether there are MQC policies or not. NPU--SERDES-PHY, no CPU, no S/W involved.

Disclaimers: I am long in CSCO. Bad answers are my own fault as they are not AI generated.

View solution in original post

Joseph W. Doherty · ‎01-17-2024

Cannot say how Cisco actually processes QoS. Only Cisco could say, and they might consider it proprietary information.

Can say except when shaping or policing, unless the tx-ring fills, results should be the same whether you start with software queues or tx-ring. So, why your question need an answer how Cisco does its QoS? I.e. knowing what Cisco QoS does isn't sufficient?

For simplicity, I would expect packets to always start with software queues but CBWFQ's typical (?) LLQ implicit policer only applies when packets are LLQ queued. I.e. so what's IOS monitoring?

Also, BTW,

"As we know, by default an interface only has a hardware queue which is FIFO."

I believe, is incorrect. Without a QoS policy, interfaces have a software FIFO queue (common default hold-queue 40 out) and a hardware FIFO queue (tx-ring-limit).

Ramblin Tech · ‎01-17-2024

Short answer: QoS is always kicked-in, even when there are no QoS policies explicitly configured. I'll come back to this...

For purposes of my post, I will define "QoS" as the classification, marking, rate limiting (shaping, policing), queueing, and scheduling of packets for the purpose of supporting differentiated treatment during both the local forwarding process and network-wide forwarding. The actual QoS implementation is entirely dependent on the particular switch/router platform, though IOS/XE/XR/NXOS do share a common method for configuring QoS operation on their underlying platforms: Cisco MQC (Modular QoS CLI).

By platform dependence, I mean a router that forwards entirely in software via its CPU can have very different QoS behaviors than routers/switches that forward using an NPU (Network Processing Unit). Software-based forwarding can be the most feature-rich & flexible, and have the fewest platform-dependent restrictions because it only depends on available CPU cores and RAM; but this richness comes at the expense of forwarding performance. By contrast, NPU-based routers/switches can have [much] higher performance, but with restrictions that are inherent in the NPU hardware. For example, Cisco implements elements with both custom silicon NPUs (eg, ASR1000 family with QFP NPUs, Cisco 8000 router family with SiOne NPUs) and merchant silicon NPUs (NCS540/5500/5700 with Broadcom DNX NPUs, various Nexus products with Broadcom StrataXGS NPUs) with each having different QoS capabilities

Each family of NPU comes with quirks and tradeoffs in their respective QoS-handling circuitry due to conflicting priorities of throughput, cost, power consumption, scale, etc. This leads to different properties for different NPUs when it comes to the hardware support for items such as the number of queues per interface, how the queues can be schedules (SP vs WRR vs WFQ vs ...), how many WRED thresholds are supported per queue, what fields can a packet be classified on and how deep into the packet can the classifier look, what fields can be [re]marked at ingress and egress, what meta-data can be associated with the packet in an ingress QoS policy for use later with an egress QoS policy, how many levels of hierarchical QoS are supported, etc. Software-based elements can just have more code written to expand QoS capabilities to cover nearly endless possibilities, but NPU-based elements have their limitations baked-in when the ASIC is designed. A concrete example of this is support for egress policing, something configurable in MQC; support for egress policing is common in Cisco custom silicon which are designed with MQC capabilities in mind, but not supported by Broadcom's merchant silicon DNX family.

Anyway, moving back toward your question... A common Cisco model of QoS, supported by MQC, is that received packets are written into dedicated packet buffer memory. After which, the received headers are processed by a packet forwarding engine which applies QoS policies, security policies (eg, ACLs) and forwards to the egress interface. Applying QoS is an integral part of the the forwarding process, not an afterthought. The QoS processing of the ingress packet headers can involve

classification into traffic classes based on a variety of criteria
associating metadata such as "qos-group" with the classification
rate limiting such as ingress policing can be applied to the classified packets with marking of metadata to indicate conformance with the rate limiting.

After applying the ingress QoS (and security) policies, the forwarding engine looks up the next-hop in the FIB and "switches" the packet to the egress interface. Nowadays, it is quite common for the switching to be virtual (rather than actually copying the packet from an ingress queue to an egress queue) through the use of Virtual Output Queues (VOQs). That is, the packet stays where it is in the packet buffer, with the packet linked to a queue for an egress interface. Which egress queue? Well that depends on the egress QoS policy. After switching the packet, the forwarding engine applies the egress QoS (and security) policies which can include:

Using the ingress classification and metadata to link the packet to a configured queue, possibly rewriting fields in the packet
Actively manage the queues with WRED
De-queue and schedule the transmission of packets out of the interface if there is more than one queue based on configured criteria (SP vs weighted whatever), while taking into account any configured rate limiting

So now, back to your question: with NPU-based routers/switches, packets are written into dedicated, high-speed packet buffers attached to the NPUs. Transit traffic is never copied to a "software queue" that is handled by the CPU (it is a different story for non-transit traffic punted out of the NPU and up to the CPU for local processing). These queues for transit traffic are h/w-based, not s/w, and are handled by the NPU. XE routers that use CPU forwarding (eg, the ISR line), may very well still use tx-ring artifacts from the IOS days when all forwarding was done by Freescale or Motorola 68000 CPUs (I have not looked closely at ISRs in many years), but NPU-based products utilize h/w capabilities for QoS, ACLs, and forwarding. Virtualized versions of products based on NPUs (eg, CSR1Kv, ASR9Kv) also tend to model their forwarding on their h/w cousins.

So what did I mean when I said "QoS is always kicked-in"? When there are no explicit MQC policies configured, there is still a default queue and a default scheduling scheme, which is essentially FIFO. Without an MQC policy, all traffic gets classified into what is essentially class-default, which then gets queued to what is essentially the class-default queue, which then gets scheduled FIFO. With MQC policies, traffic is treated according to its classification, with unclassified traffic falling through into to the class-default bucket. With NPU-based elements QoS/ACL/forwarding is via h/w, regardless of whether there are MQC policies or not. NPU--SERDES-PHY, no CPU, no S/W involved.

Disclaimers: I am long in CSCO. Bad answers are my own fault as they are not AI generated.

carl_townshend · ‎01-18-2024

Great explanation here, many thanks for writing so much detail. cheers

Joseph W. Doherty · ‎01-18-2024

Although Jim's reply has a huge amount of accurate and great information, I would quibble over a few things that might be misunderstood by less knowledgeable readers. Starting with his opening:

"Short answer: QoS is always kicked-in, even when there are no QoS policies explicitly configured. I'll come back to this..."

(As promised) he ends with:

"So what did I mean when I said "QoS is always kicked-in"? When there are no explicit MQC policies configured, there is still a default queue and a default scheduling scheme, which is essentially FIFO. Without an MQC policy, all traffic gets classified into what is essentially class-default, which then gets queued to what is essentially the class-default queue, which then gets scheduled FIFO. With MQC policies, traffic is treated according to its classification, with unclassified traffic falling through into to the class-default bucket. With NPU-based elements QoS/ACL/forwarding is via h/w, regardless of whether there are MQC policies or not. NPU--SERDES-PHY, no CPU, no S/W involved."

One must take careful note of the very last sentence which places a condition on the "always", which might, in fact, not be "always" on all Cisco router/switch platforms (especially, I suspect, software based [small] routers [or some out-of-date small switches that might be still found in production, like 3560/3750 switches, for which, QoS was globally enabled or not, and with enablement, you got an implicit QoS policy, which could be modified]).

Further, even if a processing path is shared with actual QoS processing logic (software and/or hardware), I wouldn't consider a single egress FIFO queue "QoS". (Laugh, unless, perhaps, your "QoS model/policy" is best effort for all traffic. However, kidding aside, it's true proper sizing a single egress FIFO queue could be considered "QoS", but then we're doing actual QoS configuration, not relying on some device default [sized for?].)

As Jim does mention several times, QoS features can vary (greatly) between platforms, but because of this, QoS features available in MQC, also vary greatly. For the most part, if a QoS feature works the same across multiple platforms, its MQC configuration will be the same, or very similar, but again, actual available QoS feature availability can vary greatly. (BTW, MQC seems to owe much of its development from software based router QoS, and also usually won't be found on out-of-date switch platforms.)

As Jim notes, he doesn't have much experience with the small ISRs, i.e. devices without NPUs. They still exist, and I believe they still work pretty much the same way, even running IOS-XE.

Jim description of NPU supporting platforms, I believe, is spot on, and you need that kind of architecture to support the wire-speed bandwidth those devices do. However, what they provide for "QoS" feature support, IMO, is very lacking compared to many small ISRs, but, understand, QoS is, IMO, the way to deal with congestion yet if you have enough bandwidth, you don't have any if much congestion. (And, if you don't have troublesome congestion, do you need QoS?)

Personally, I've seldom worried much about even providing LAN QoS, but WAN QoS, that's a whole different issue!

An example: high end platform QoS, the RFC and Cisco QoS models (NB: almost but not identical) both have 12 classes but many Cisco QoS hardware might not support more than 8 egress queue. Low end software based QoS, I recall (?) supports 1K classes (and many features within or between those classes).

@carl_townshend you mention seeing many different "explanations". Perhaps, because of the impact of so many Cisco hardware variants for QoS feature support. (Heck, the Catalyst 4Ks had FRED, which I don't believe is supported on any other platform, even software based QoS.)

Carl, I'll be more direct, why your question? Idle curiosity or is there some QoS issue where what triggers QoS is important to you?

I will say, on the platforms that support interface tx-rings, knowing that, and how its potential impact, can be very important to effective QoS.

In fact, decades ago, I was going crazy why my QoS wasn't working as well as I expected on a (WAN) 7500. This Cisco document was the key to my QoS behavior issue. I.e. I was dealing with my "fancy" software QoS queues and (at that time, not knowingly) the tx-ring FIFO queue (NB: BTW, I wasn't using ATM, but same issue applies to other interfaces.) After I reduced the tx-ring-limit, my QoS behavior was as expected. (NB: I recall [?] later IOS variants supposedly would automatically reduce tx-ring-limit if a QoS policy was applied to the interface.)

carl_townshend · ‎01-18-2024

Hi Joseph

thanks for further clarifying.

I asked the question mainly down to curiosity, also I heard a fellow engineer mention the kick in word the other day, I think it’s misunderstood.

without going too technical, I guess my question is, if I enable qos on say an ISR4k or say 9300 switch and there is no congestion, would the packet flow through one of the queues before it gets put into the interface tx ring?

or would it bypass these queues and go straight out ?

could you explain, is there a simple diagram maybe?

also, if you create different classes, does each class go into its own queue or do some classes share a queue?

im guessing share ?

Cheers

Giuseppe Larosa · ‎01-18-2024

Hello @carl_townshend ,

your question is one that creates some discussion and it is not an easy one.

>> without going too technical, I guess my question is, if I enable qos on say an ISR4k or say 9300 switch and there is no congestion, would the packet flow through one of the queues before it gets put into the interface tx ring?

From lab tests about CBWFQ and LLQ that I have done many years ago with traffic generator on platforms that were high end platforms routers like Cisco 12000 and Cisco 7500 ,Cisco 7200 I can tell you that when you configure MQC on the interface, the output of

show policy-map interface type x/y

provides counters for each class defined in the policy even if no congestion occurs. The counters increase according to the generated traffic and packets for each traffic class are classified as queued and transmitted.

So the experimental evidence should drive to think that once the SW queues are configured they are always on regardless of the fact that congestion is occurring or not on the interface,

With great surprise some years later , during reading a book for the QoS exam that was in the CCNP Voice path . (A book by Kevin Wallace) I have read that the SW queues are created and used on the fly only when congestion occurs and are "bypassed" when congestion is not happening.

In my humble opinion this is not true :

a) first the counters for each class increase correctly even when total traffic is less then 100% of line rate at least in the platforms we have tested.

b) if the Software queues are already configured and used the router is able to manage congestion correctly otherwise how it can handle congestion in the way we would like it to do (different treatment for packets in different class or even for packets in the same class with different DSCP value ) if the packets are not already in their respective SW queues ? Think of WRED that is DSCP based and it can use different thresholds for different DSCP values and a class can host multiple AFxy DiffServ PHBs.

Later books even for CCIE written exams still show this "bypass" for SW queues in their QoS chapters.

@Joseph W. Doherty : what do you think of this interesting issue ?

Hope to help

Giuseppe

Joseph W. Doherty · ‎01-18-2024

Giuseppe, yup class counters get updated, but did the packet logically passing through the class-map actually also get queued/dequeued, in a class-map queue, before possibly being queued/dequeued to an empty or not full tx-ring queue?

I.e. counters being updated, I don't believe are conclusive evidence that the packets are actually always being processed by the class-map queues.

Certainly, from a coding perspective, it's simpler to process all packets alike, i.e. no "shortcuts".

Also certainly, on a software dependent router, if you're trying to maximize performance, "shortcuts", bypassing processing, allow more performance.

Which did Cisco chose? I don't know.

Very possibly Cisco had used different approaches over the years, especially across different platforms, which could account for various QoS authors having different "sources" at different times.

Laugh, BTW Giuseppe, you realize WRED doesn't use actual queue values, but uses moving queue length/depth averages. I.e. it can drop newly arriving packets even when current actual queue is empty, or queue packets beyond its (WRED) tail drop limits? I.e. so it doesn't need packets in an actual SW queue to "work".

"what do you think of this interesting issue ?"

If results are not impacted, actually using a SW queue or not, it's not, to me, very interesting, beyond sort of Cisco QoS trivia. However, in my original reply, I wrote ". . . unless the tx-ring fills, results should be the same whether you start with software queues or tx-ring." Well, that's usually the case, but it might not be.

Say a VoIP packet and a jumbo FTP packet arrive at the SAME time. Logically, we likely want the VoIP packet to have priority, but does it if the SW queues are not used? (If such an example does matter, you're likely looking to need LFI QoS too! And, if LFI is being used [to preclude the issue of the FTP packet arriving physically just before the VoIP packet], then again, what goes next for same time arrival of VolP vs. FTP packet may not matter if FTP packet has been fragmented by LFI [assuming FTP fragments are considered to arrive after VoIP packet - of course all the time SW dequeuing should support that]!)

Don't misunderstand, there's nothing wrong with Carl's question, but it, specifically, I believe usually has very little "real world" impact. However, Carl's question branches into some interesting directions. His mention of SW vs. tx-ring queues, can be very important for effective QoS (again, see the Cisco reference I provided [generally overlooked in any Cisco QoS or 3rd party QoS info]), @Ramblin Tech information on the QoS differences due to hardware support (which I didn't even think to mention), and your mention of CCIE book QoS differences (which also dovetails with Carl's mention of "different answers").

Joseph W. Doherty · ‎01-18-2024

"without going too technical, I guess my question is, if I enable qos on say an ISR4k or say 9300 switch and there is no congestion, would the packet flow through one of the queues before it gets put into the interface tx ring?"

On a switch, or a "router" that's pretty much like a L3 switch (e.g. the old 6500 "switch" vs. the 7600 "router), I think it likely that an interface tx-ring, per se, may not exist for all interface type of ports.

On a software router, such as possible like an ISR 4K, which support all kinds of different interfaces, for those, an interface tx-ring buffer is very likely.

(Why the two prior paragraph statements are likely true has much to do with computer hardware architectures, which a switch or router is still but a computer but with a specific purpose.)

On a device with much dedicated hardware, I think it most likely packets are are treated alike for processing.

On a device that doesn't have much dedicated hardware, processing shortcuts might be in place depending on what's "happening" at any point in time. I.e. If the system is using a tx-ring, and it's empty or not full, to speed things along, I could see packets bypassing some unnecessary processing, like "queuing" a packet directly to it rather than queuing a packet to a QoS/software "queue" and then immediately dequeuing from that queue to queue in the tx-ring.

"also, if you create different classes, does each class go into its own queue or do some classes share a queue?"

That's an "it depends".

On routers, CBWFQ LLQ has one physical queue which all LLQ classes share. (NB: prior to the later 2 level LLQ feature.)

Also on routers, that support CBWFQ FQ (or the earlier variations of WFQ), there's a set of queues allocated for flows, but multiple flows might map to the same queue (much like flows map to Etherchannel links).

Most other class queue would just use one queue (which is also how they are usually mapping into hardware supported egress queues).

Ramblin Tech · ‎01-19-2024

As I have not paid much attention to Cisco enterprise products in some time (focused on SP products instead), I lost touch with what is going on with the Catalyst brand from a forwarding and QoS perspective. I took a quick look at the CiscoLive! site to see if there was anything good for the Catalyst 9000 and... bingo!

https://www.ciscolive.com/c/dam/r/ciscolive/global-event/docs/2023/pdf/BRKENS-2096.pdf

I quickly perused the above preso and it appears to have a wealth on info on how the Cat9K and its two families of custom silicon NPUs (UADP and SiOne) handle QoS. I will add listening to the session recording on my to-do list to try to get more than a superficial understanding of how the Cat9K works (my starting point).

A side note on Silicon One... Dune Networks, a fabless semiconductor startup was designing fast, innovative NPUs when they were acquired by Broadcom in 2009. The Dune designs became Broadcom's StrataDNX product line, the leading merchant silicon routing NPUs found across the industry (Cisco NCS and Nexus products, whiteboxes from Edgecore and Ufispace, etc). When the golden handcuffs came off the Dune leadership team at Broadcom, they left to form what became Leaba Semiconductor to push the envelope further with a new generation of NPUs. Cisco acquired Leaba in 2016, with their work eventually becoming the Silicon One family of NPUs. Leaba's CEO and co-founder, Eyal Dagan, has since become the EVP of Cisco's Common Hardware Group which designs hardware across Cisco's portfolio.

I digress into SiOne because it appears to be the future of Cisco NPUs. SiOne first launched with the Cisco 8000 router product line (the successor to the CRS, not the similarly named Cat8000) and has been expanding up and down with L2 & L3 scale since, as well as across product lines and into the Cat9000 and NX9800. Future lower-end SP and enterprise products running on NPUs will likely move to SiOne as will higher end ones. The Common Hardware Group is even selling the SiOne NPUs on the open market to whitebox vendors. If you want to see the future of NPU-based forwarding and QoS at Cisco (and maybe even outside of Cisco), keep an eye on SiOne.

Disclaimers: I am long in CSCO. Bad answers are my own fault as they are not AI generated.

Joseph W. Doherty · ‎01-19-2024

I was aware of UADP ASIC features, impressive in their own right, but unaware of Silicon One ASIC features, even more impressive. Hardware QoS support continues to get closer and closer to what's found on software based routers.

Interestingly, Silicon One is still limited to 8 egress queues, and doesn't appears to support FQ. Its additional buffer capability feature, HBM, is interesting. Other subtle differences between UADP and Silicon One QoS, not all, to me, clearly "improvements". (Personally, I also find such subtle differences a bit annoying when QoS operates a bit differently using SAME configuration statements. Last saw some of this when Cisco moved to HQF.)

I do agree, hardware acceleration will also be expanded on smaller platforms; possibly still also keeping true to "Moore's Law".

Heck, during my IT career, I started programming on multi-million dollar mainframes that had less RAM than my phone, and probably less CPU power too.