Solved: Re: 4500-X and how Cisco completely screwed up Auto QoS

derek.small · ‎07-29-2020

It would be just swell is Cisco could do 1 thing consistently..... Every Catalyst platform Cisco makes support "auto qos", every platform, except for 1. And I get so frustrated reading docs that say stuff like "when you issue the command auto qos trust dscp it does this..." No it doesn't because you cannot enter that command on a 4500-X so stop pretending you can.

Every where I turn there is another Cisco doc, that has 20 pages of crap I already know about QoS, but which I have to wade through to findout, the doc offers no help at all.

I have a pair of 4500-Xs, everything that goes into the switch has DSCP markings set correctly, but everything I capture on the 4500-X and everything I capture coming out of the 4500-X shows the DSCP markings are completely screwed. It should be a really simple problem to fix. "Auto qos trust dscp", but NO Cisco doesn't give you that command or if they actually do somehow, there is only some mole at TAC who you will never find that knows what magic command to enter to enable it. MAKE QOS CONSISTENT AND EASY ACROSS ALL YOUR PLATFORMS!!!!!

If anyone can offer some advice on how to solve such a simple freaking problem as running the command "auto qos trust dscp" ON A 4500-X, please help. I am not pasting in page after page of class-maps and policy-maps so TAC can complain about some value that I picked or didn't pick. QoS on Cisco devices is voodoo magic, and NO 2 TAC engineers have ever or will ever give the same advice about configuring it so I'm done trying. At least with "auto qos trust dscp" there was no arguing with the next TAC engineer you had to deal with when you had a problem. CISCO WHY DID YOU COMPLETELY SCREW THIS UP ON THE 4500-X?????

If there is some magic command that will enable "auto qos trust dscp" on a 4500X please let me know. Don't send links to more config guides, or command references, I have read them ALL. I'm not going to waste more hours of my life reading through table-maps, 4-tier, 7-tier or 13-tier queuing models, or leaky bucket, or CAR or MQC, CBWFQ, LLQ or LLQIO or EIEIO.... I don't want to know or evaluate the significance of a 2P3Q2T model versus a 1P2Q2T model (It's hardware anyway, it's not like you gave us a choice) I just want to put a QoS policy in place that doesn't trash my DSPC markings and might have a prayer of working better than FIFO.

Joseph W. Doherty · ‎07-30-2020

(Likely more information then you're looking for "now", but besides, perhaps, addressing your immediate needs, some things to consider so you can avoid all the "voodo" QoS documentation, and 23-flavors of TAC 2 engineers "recommendations".)

On Cisco switches, ignoring "Auto QoS", QoS usually is supported in one of three ways.

First, on some switches, its QoS it's disabled by default. When disabled, frame/packet ToS markings are not changed, and generally egress interfaces just have an egress (FIFO) queue.

Second, on some switches, QoS is always enabled by default or for those where the default is disabled, QoS has been enabled. For either, the default is the ToS is "not trusted", unless configured to be trusted, so the marking is reset to zero, again by default, unless switch is configured to "trust" ingress or the ToS marking is otherwise explicitly set based on some ingress QoS policy.

Third, on some switches (generally the later ones, or possibly older ones with later a later IOS), by default, the switch behaves much like a Cisco router. I.e. ingress markings are "trusted", by default. Again, an ingress policy can do whatever it wants, such as remarking. For egress, the switch may, or may not, have a default egress policy beyond a single egress port (FIFO) queue.

Switches supporting QoS, again, often have a default egress configuration, beyond just a single egress port FIFO queue. Regardless, such switches generally have configurable egress QoS features which can widely vary based on the switch model (or sup and/or line cards) because QoS features are so bound to the switch's underlying hardware.

Reason I mention some of the above, it's possible a very recent IOS version, for the 4500-X, might be in the third kind of QoS supporting switches, mentioned above.

If not, or if you need/want to stay on your current IOS, as Pieterh mentioned, you might find using some form of interface trust command should, at least, eliminate the switch resetting your ToS markings. (Of course, it's annoying having to set such a command on most, if not all, interfaces, but the "range" command reduces much of that burden.)

That said, we're still stuck with not wanting an egress FIFO queue, alone. On a 4500-X, I suspect, you'll get some form of four or eight class model, by default. Which, for you, may work as well, or even better, than whatever a particular IOS Auto-Qos du jour provides. Of course, the converse is true too, i.e. Auto-QoS may work better than the switch's default QoS.

The thing is, you believe Auto-QoS will do better than a single FIFO queue. Maybe, perhaps even most often, especially for some packets with a DSCP EF marking. (Probably also true for switch's, built-in, default QoS.) However, some QoS configurations can actually be worst than a single FIFO queue, at least for non-PQ/LLQ traffic.

Personally, I recommend against using Auto-QoS, for multiple reasons (which I'm not going to enumerate here). If you're going to provide QoS, define your own QoS policy, that meets your service needs, and then you figure out how to make any device support your model, as well as it can.

A generic "logical" QoS model I've found, handle 99.99% of QoS needs, is as follows:

policy-map Generic
class real-time !e.g. traffic: VoIP bearer, DSCP EF
priority percent 30..50% !normally I use about 1/3 to 40%
class foreground !e.g. traffic: IPPrec 6 and 7, DSCP CS5, VoIP control
bandwidth remaining percent 81
fair-queue !may not be needed, as this class should be lightly used, bandwidth percentage is to prioritize its traffic and/or minimize loss
class background !e.g. traffic: DSCP CS1, FTP
bandwidth remaining percent 1
fair-queue !also may not be needed, assuming this class's traffic is all non-critical
class class-default !e.g. traffic: IPPrec 0,1,2,3,4
bandwidth remaining percent 9
fair-queue !ideally, you really, really want FQ, at least for this class

View solution in original post

Joseph W. Doherty · ‎07-30-2020

BTW, I was just looking at QoS documentation (IOS XE 3.8.0E and IOS 15.2(4)E - latest I've found) for 4500 series. 4500-X is based on a sup7.(?)

ref: https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst4500/XE3-8-0E/15-24E/configuration/guide/xe-380-configuration/qos_mrg.html

Two interesting notes:

Under: MQC-based QoS Configuration
Note The incoming traffic is considered trusted by default. Only when the trusted boundary feature is enabled on an interface can the port enter untrusted mode. In this mode, the switch marks the DSCP value of an IP packet and the CoS value of the VLAN tag on the Ethernet frame as “0”.

Under: Auto-QoS Overview
Note If you have an auto-QoS policy on a port connected to a device that supports CDP, the port is automatically trusted. However, if the device does not support CDP (like legacy Digital Media Player), QoS trust must be applied manually.

So interface ingress trust mode, varies, based on configuration, and CDP hosts (yikes). Also:
Under: Auto-Qos Compact
When you enter an auto-QoS command, the switch goes on to display all the generated commands as if the commands were entered from the CLI. Enable auto-QoS compact if you want to hide auto-QoS generated commands from the running configuration.

It's possible relevant QoS commands are "hidden".

Yea, Cisco doesn't make QoS easy. To be somewhat (laugh) fair to Cisco, it's difficult to keep all alike when you're adding new and/or improved QoS features. Also difficult to keep them alike when they do rely so much on under-lying hardware (unlike a software based router).

BTW, besides the on-going QoS changes you mentioned, the one the really blew my mind, when Cisco moved to HQF, the same exact syntax, for some of CBWFQ, worked differently! (This on software based routers. If fact, same exact model. Upgrade IOS, and get different QoS results with exactly the same configuration.)

View solution in original post

Joseph W. Doherty · ‎07-30-2020

Well, again to be fair to Cisco, when they are designing hardware, they also design how they think egress QoS should work (also what's actually possible to support - at that time). Cisco's thinking on how QoS should work has "evolved" much over the decades, so has what's practical to implement in hardware.

QoS thinking hasn't just evolved with Cisco, it has with the industry too. You mention IPPrec vs. DCSP overlap in how ToS's IPPrec bits are used. That's not by accident. Nor is how L2 CoS is much like L3 IPPrec. The latter makes it "easy" to match up L2 CoS to L3 IPPrec, or the converse.

RFC791 defined how to use 6 of the 8 bits of the ToS byte, IPPrec and DTR. RFC1349 defined a use for the 7th bit, cost. Then the RFC defining AF, superseded much of those RFCs, but implicitly left IPPrec 0, 5, 6 and 7 alone, for backward compatibility. (Also, I recall, suggesting IPPrec 1..4 be kept in mind too, for backward compatibility.) Then we have the RFC for scavenger traffic, using CS1, which places it "below" BE, which is contrary to IPPrec 0 vs. IPPrec 1.

So, you have all that going on, and where it really gets confusing is how traffic should be treated. When should you, and/or how should you, drop traffic; when should you queue traffic; how should traffic be dequeued; etc. Trying to address such issues, especially with hardware support, is how we get WRR, SRR, DWRR, DBL, FRED, to mention a few. (Oh, I too dislike Nexus, it's sort of like jumping onto another vendor's equipment, but when it comes to "strange" QoS, look at a 4500 sup4's [?].)

So, although I'm in 100% agreement it would be nice if everything worked the same across all their products, but with QoS, that's especially difficult if you want to stay current. Heck, early Cisco switches, that supported egress QoS, often only provided two hardware egress queues (may have also not had a non-head-of-line blocking architecture). (Likewise, same time vintage, Cisco routers might only have supported PQ with four "classes".) Later, Cisco switches often supported four hardware queues, now eight is more common.

In other ways, much of QoS issues, I believe, stems from how misunderstood the subject is. As you mentioned early on, its seems much like "voodoo". Back when Cisco was pushing their Olympic QoS model, I felt it was insufficient. Likewise, now that Cisco (and RFC) are up to a 11 or 12 class model, I believe it's overly complex. With either, neither, generally, were well explained on how to actually use them. Even the latest don't deal with how something like web or Microsoft traffic should be handled, since both might have almost any kind of traffic using those protocols. Or, for another case, how do SSH and SCP differ, in protocol "appearance"?

So, again, conceptionally I agree with you, but there are real-world issues which makes it hard to obtain what we both desire.

View solution in original post

pieterh · ‎07-30-2020

"Auto qos trust dscp"

i think you combine two different commands
- auto qos
- mls qos trust dscp (in interface configuration mode)

derek.small · ‎07-30-2020

Thank you for replying, but here is what I am on about...

On just about any Catalyst switch platform you can do the following:

MIN-C3650-23-1#conf t

Enter configuration commands, one per line. End with CNTL/Z.

MIN-C3650-23-1(config)#int gig1/0/34
MIN-C3650-23-1(config-if)#auto qos trust ?
cos Trust the CoS marking
dscp Trust the DSCP marking
<cr>

So "auto qos trust dscp" is a valid config command. Now lets try the same thing on a 4500X-32, running 15.2(7E) (aka 3.11.00.E)

MIN-C4500X-23-1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
MIN-C4500X-23-1(config)#int ten1/1/2
MIN-C4500X-23-1(config-if)#auto ?
security-port Configure AutoSecurity

MIN-C4500X-23-1(config-if)#auto qos ?
% Unrecognized command
MIN-C4500X-23-1(config-if)#end
MIN-C4500X-23-1#
MIN-C4500X-23-1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
MIN-C4500X-23-1(config)#mls
MIN-C4500X-23-1(config)#mls?
% Unrecognized command
MIN-C4500X-23-1(config)#mls ?
% Unrecognized command
MIN-C4500X-23-1(config)#

So, "auto qos trust dscp" is a valid command on every other Catalyst platform, except for the 4500X, And although "mls qos" is a valid command on the larger 6500/6800 series platforms that is the only platform I am aware of which supports a global config command starting with "mls ....." The 4500X does not, and if there is some other magic command to get the 4500X to support some of the more common QoS settings, I would sure love to know that.

My point is there is no reason for Cisco to change this. I'm tired of every new platform that Cisco comes out with, and sometimes every new version of the same flavor of IOS, having new QoS config rules. STOP MAKING THEM ALL DIFFERENT! I really don't care, and don't want to have to care how many priority ques, normal ques or thresholds a switch, blade, module or port supports. I get that switches use hardware based queuing so it has to be different than the policy you would use on a router, but the config interface really doesn't have to be. (Along that line why don't routers support something like "auto qos trust dscp"?). I've always likes IOS, and the power that it gives you to dig deep when you need to, but with QoS, I'm tired of always being forced to have to go deep to get a policy in place that works. It really doesn't have to be this hard.

Policy based queuing has been around since Cisco's very first routers, and I remember when it was called CBQ, then CBWFQ, and later queuing on switches started to be more configurable, and routers could do policies within polices and Cisco announced they were moving to their MQC model, which was going to standardize all their queuing configs, but, it didn't. Then Nexus hit the scene with a completely new approach to queuing, and a much more limited approach, but more transparent. Somewhere in there we also switched from CBWFQ to LLQ, which is nothing more than CBWFQ with 1 or 2 priority queues (did we need a new name for it?)

Every new platform has some new wrinkle, either something which was added or something which was taken away, but at the end of the day, the policy that you need to implement hasn't really changed that much. I'm tired of pouring through pages and pages of config guides to find the 1 or 2 things which changed, but which completely prevent the policy I've always used from working, from even being entered into the config. Quit changing the CLI for every new platform or version. Find a QoS configuration language that can express most basic QoS config requirements and make that work on whatever hardware and software you run it on. I shouldn't have to read a 50 page config guide on QoS everytime Cisco releases a new switch. Give me the highlights that might impact my network, and LEAVE THE INTERFACE ALONE!!!!

Joseph W. Doherty · ‎07-30-2020

BTW, I was just looking at QoS documentation (IOS XE 3.8.0E and IOS 15.2(4)E - latest I've found) for 4500 series. 4500-X is based on a sup7.(?)

ref: https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst4500/XE3-8-0E/15-24E/configuration/guide/xe-380-configuration/qos_mrg.html

Two interesting notes:

Under: MQC-based QoS Configuration
Note The incoming traffic is considered trusted by default. Only when the trusted boundary feature is enabled on an interface can the port enter untrusted mode. In this mode, the switch marks the DSCP value of an IP packet and the CoS value of the VLAN tag on the Ethernet frame as “0”.

Under: Auto-QoS Overview
Note If you have an auto-QoS policy on a port connected to a device that supports CDP, the port is automatically trusted. However, if the device does not support CDP (like legacy Digital Media Player), QoS trust must be applied manually.

So interface ingress trust mode, varies, based on configuration, and CDP hosts (yikes). Also:
Under: Auto-Qos Compact
When you enter an auto-QoS command, the switch goes on to display all the generated commands as if the commands were entered from the CLI. Enable auto-QoS compact if you want to hide auto-QoS generated commands from the running configuration.

It's possible relevant QoS commands are "hidden".

Yea, Cisco doesn't make QoS easy. To be somewhat (laugh) fair to Cisco, it's difficult to keep all alike when you're adding new and/or improved QoS features. Also difficult to keep them alike when they do rely so much on under-lying hardware (unlike a software based router).

BTW, besides the on-going QoS changes you mentioned, the one the really blew my mind, when Cisco moved to HQF, the same exact syntax, for some of CBWFQ, worked differently! (This on software based routers. If fact, same exact model. Upgrade IOS, and get different QoS results with exactly the same configuration.)

Joseph W. Doherty · ‎07-30-2020

(Likely more information then you're looking for "now", but besides, perhaps, addressing your immediate needs, some things to consider so you can avoid all the "voodo" QoS documentation, and 23-flavors of TAC 2 engineers "recommendations".)

On Cisco switches, ignoring "Auto QoS", QoS usually is supported in one of three ways.

First, on some switches, its QoS it's disabled by default. When disabled, frame/packet ToS markings are not changed, and generally egress interfaces just have an egress (FIFO) queue.

Second, on some switches, QoS is always enabled by default or for those where the default is disabled, QoS has been enabled. For either, the default is the ToS is "not trusted", unless configured to be trusted, so the marking is reset to zero, again by default, unless switch is configured to "trust" ingress or the ToS marking is otherwise explicitly set based on some ingress QoS policy.

Third, on some switches (generally the later ones, or possibly older ones with later a later IOS), by default, the switch behaves much like a Cisco router. I.e. ingress markings are "trusted", by default. Again, an ingress policy can do whatever it wants, such as remarking. For egress, the switch may, or may not, have a default egress policy beyond a single egress port (FIFO) queue.

Switches supporting QoS, again, often have a default egress configuration, beyond just a single egress port FIFO queue. Regardless, such switches generally have configurable egress QoS features which can widely vary based on the switch model (or sup and/or line cards) because QoS features are so bound to the switch's underlying hardware.

Reason I mention some of the above, it's possible a very recent IOS version, for the 4500-X, might be in the third kind of QoS supporting switches, mentioned above.

If not, or if you need/want to stay on your current IOS, as Pieterh mentioned, you might find using some form of interface trust command should, at least, eliminate the switch resetting your ToS markings. (Of course, it's annoying having to set such a command on most, if not all, interfaces, but the "range" command reduces much of that burden.)

That said, we're still stuck with not wanting an egress FIFO queue, alone. On a 4500-X, I suspect, you'll get some form of four or eight class model, by default. Which, for you, may work as well, or even better, than whatever a particular IOS Auto-Qos du jour provides. Of course, the converse is true too, i.e. Auto-QoS may work better than the switch's default QoS.

The thing is, you believe Auto-QoS will do better than a single FIFO queue. Maybe, perhaps even most often, especially for some packets with a DSCP EF marking. (Probably also true for switch's, built-in, default QoS.) However, some QoS configurations can actually be worst than a single FIFO queue, at least for non-PQ/LLQ traffic.

Personally, I recommend against using Auto-QoS, for multiple reasons (which I'm not going to enumerate here). If you're going to provide QoS, define your own QoS policy, that meets your service needs, and then you figure out how to make any device support your model, as well as it can.

A generic "logical" QoS model I've found, handle 99.99% of QoS needs, is as follows:

policy-map Generic
class real-time !e.g. traffic: VoIP bearer, DSCP EF
priority percent 30..50% !normally I use about 1/3 to 40%
class foreground !e.g. traffic: IPPrec 6 and 7, DSCP CS5, VoIP control
bandwidth remaining percent 81
fair-queue !may not be needed, as this class should be lightly used, bandwidth percentage is to prioritize its traffic and/or minimize loss
class background !e.g. traffic: DSCP CS1, FTP
bandwidth remaining percent 1
fair-queue !also may not be needed, assuming this class's traffic is all non-critical
class class-default !e.g. traffic: IPPrec 0,1,2,3,4
bandwidth remaining percent 9
fair-queue !ideally, you really, really want FQ, at least for this class

derek.small · ‎07-30-2020

Thank you for responding Joseph, and for the detail. I really do appreciate it.

I get that there are a few different platform approaches to QoS, and I know it's shouting at the wind to get Cisco to fix that. I can even live with that without complaint. What I can't tolerate is Cisco's apparent complete lack of concern with making their customers and support engineers have to continually learn the new syntax of the day or of the moment to building the QoS policy they have pretty much always used.

I also agree with your general QoS policy. I have one myself, not all that different. It is 7-tiered, and I generally implement 7 tiers, even if the network I'm building is only expected to support voice and data, but I've written volumes if not books, on why you should implement a consistent QoS policy across your enterprise, that can accommodate everything you might need, even if you don't want or need it now.

The reason I got on my soapbox is that I'm tired of having to continually relearn a new CLI interface to do the same job I've always needed done. Nexus is a good example. With Nexus, lets say I want to run OSPF, if I go into config mode and try to type "router ospf", it's not a supported command. I first have to type "feature ospf" then I am permitted to enter the config command "router ospf". WHY? Why not just give me the config command option, and if I type "router ospf" in global config mode, the OS can start the OSPF process as if I had typed "feature ospf", so I don't have to spend hours trying to figure out why I can't run OSPF on this platform, is it licensing, did I get the wrong model? Everything says I just type "router ospf" to configure OSPF. Why does the platform I'm working on not support that?

Here is a secret. Catalyst platforms years ago when from a complex kernel model that supported and ran everything, to modular model, and if you don't configure OSPF on a Catalyst, then the switch doesn't run an OSPF process. WOW! I didn't have to enable or disable the OSPF process on catalyst switches, they just figured out if I needed it or not based on my config. What a fantastic idea!!! Thank Cisco!

It's too bad that kind of thinking can't be applied to QoS configurations. Then the "auto qos trust dscp" command could be supported on EVERY SINGLE CISCO PLATFORM, routers, switches, name it! Wouldn't that be wonderful. And the command would simply create a basic QoS policy that would ensure at least 80 percentile support for voice, video, and maybe a few classes of data traffic, and would implement that policy in the most effective way on the hardware you ran the command on. This port uses software queuing, yeah, its configured for your base policy now. This port uses 1P2Q3T, yeah, it's configured for your base policy now. This port uses 2P3Q2T, yeah it's configured for your base policy now.

Here is another secret. The first 6 bits in the TOS header in an IP frame are the DSCP bits, and the first three of those are called the IP-Precedence. Three bits, hey that is the same as the COS bits in the Ethernet header of an 802.1q frame. And unless you are a sadist, they should just match! In fact all the Class Selector (CSx) DSCP values have zeros in the last three bits of the DSCP field, so the CSx DSCP values ARE the COS values. So why do we even need the ability to map DSCP fields to COS fields and vise-versa? It's like some software engineer at Cisco decided, I'll put a COS to DSCP mapping option in the config, that will let them feel like they are doing something. Why not just give them a loaded gun with no trigger and let them figure out how it works. Again, stop it! If you really think some nut case is going to need to map the three IP-precedence bits to the COS bits differently than just making them the same then leave it in there but make it hidden unless you depart from what everyone should be doing. The same thing goes for mapping DSCP or COS values to queues and thresholds. "auto qos trust dscp" should not cause my config to double in size. Just give me a QoS config that works for the switch platform and port hardware that I configured it on.

Joseph W. Doherty · ‎07-30-2020

Well, again to be fair to Cisco, when they are designing hardware, they also design how they think egress QoS should work (also what's actually possible to support - at that time). Cisco's thinking on how QoS should work has "evolved" much over the decades, so has what's practical to implement in hardware.

QoS thinking hasn't just evolved with Cisco, it has with the industry too. You mention IPPrec vs. DCSP overlap in how ToS's IPPrec bits are used. That's not by accident. Nor is how L2 CoS is much like L3 IPPrec. The latter makes it "easy" to match up L2 CoS to L3 IPPrec, or the converse.

RFC791 defined how to use 6 of the 8 bits of the ToS byte, IPPrec and DTR. RFC1349 defined a use for the 7th bit, cost. Then the RFC defining AF, superseded much of those RFCs, but implicitly left IPPrec 0, 5, 6 and 7 alone, for backward compatibility. (Also, I recall, suggesting IPPrec 1..4 be kept in mind too, for backward compatibility.) Then we have the RFC for scavenger traffic, using CS1, which places it "below" BE, which is contrary to IPPrec 0 vs. IPPrec 1.

So, you have all that going on, and where it really gets confusing is how traffic should be treated. When should you, and/or how should you, drop traffic; when should you queue traffic; how should traffic be dequeued; etc. Trying to address such issues, especially with hardware support, is how we get WRR, SRR, DWRR, DBL, FRED, to mention a few. (Oh, I too dislike Nexus, it's sort of like jumping onto another vendor's equipment, but when it comes to "strange" QoS, look at a 4500 sup4's [?].)

So, although I'm in 100% agreement it would be nice if everything worked the same across all their products, but with QoS, that's especially difficult if you want to stay current. Heck, early Cisco switches, that supported egress QoS, often only provided two hardware egress queues (may have also not had a non-head-of-line blocking architecture). (Likewise, same time vintage, Cisco routers might only have supported PQ with four "classes".) Later, Cisco switches often supported four hardware queues, now eight is more common.

In other ways, much of QoS issues, I believe, stems from how misunderstood the subject is. As you mentioned early on, its seems much like "voodoo". Back when Cisco was pushing their Olympic QoS model, I felt it was insufficient. Likewise, now that Cisco (and RFC) are up to a 11 or 12 class model, I believe it's overly complex. With either, neither, generally, were well explained on how to actually use them. Even the latest don't deal with how something like web or Microsoft traffic should be handled, since both might have almost any kind of traffic using those protocols. Or, for another case, how do SSH and SCP differ, in protocol "appearance"?

So, again, conceptionally I agree with you, but there are real-world issues which makes it hard to obtain what we both desire.