Solved: Re: Assistance understanding live QoS config - example for MPLS

Nathan Farrar · ‎07-24-2019

I'm rusty with QoS. Really haven't had to work with it much since a lot of the networks I work on have bandwidth to spare. I've been working with some MPLS sites that have lower bandwidth connections, around 10Mbps is typical for these sites.

There have been issues and I believe it is due to the service provider dropping packets due to the CIR being exceeded. I'm trying to understand the QoS configuration better. The goal is to make some adjustments to allow file transfer over SMB to function better (TCP 445). I am looking for some help in how to best implement this as well as understand what is already occurring. The remote site was setup with AutoQoS but I don't think it is taking into account the slower upstream bandwidth. I have the service provider's config as well which is using IPP (precedence) to do it's shaping.

Would be super helpful if someone who is more knowledgable could walk me through how the current config is working and where best to adjust to help alleviate issues.

Here is the switch where all clients are connecting:

class-map match-any AutoQos-4.0-Output-Multimedia-Conf-Queue
 match dscp af41  af42  af43 
 match cos  4 
class-map match-any AutoQos-4.0-Output-Bulk-Data-Queue
 match dscp af11  af12  af13 
 match cos  1 
class-map match-any AutoQos-4.0-Output-Priority-Queue
 match dscp cs4  cs5  ef 
 match cos  5 
class-map match-any AutoQos-4.0-Output-Multimedia-Strm-Queue
 match dscp af31  af32  af33 
class-map match-any AutoQos-4.0-Voip-Data-CiscoPhone-Class
 match cos  5 
class-map match-any AutoQos-4.0-Voip-Signal-CiscoPhone-Class
 match cos  3 
class-map match-any non-client-nrt-class
class-map match-any AutoQos-4.0-Default-Class
 match access-group name AutoQos-4.0-Acl-Default
class-map match-any AutoQos-4.0-Output-Trans-Data-Queue
 match dscp af21  af22  af23 
 match cos  2 
class-map match-any AutoQos-4.0-Output-Scavenger-Queue
 match dscp cs1 
class-map match-any AutoQos-4.0-Output-Control-Mgmt-Queue
 match dscp cs2  cs3  cs6  cs7 
 match cos  3 
!
policy-map port_child_policy
 class non-client-nrt-class
  bandwidth remaining ratio 10
policy-map AutoQos-4.0-Output-Policy
 class AutoQos-4.0-Output-Priority-Queue
  priority level 1 percent 30
 class AutoQos-4.0-Output-Control-Mgmt-Queue
  bandwidth remaining percent 10 
  queue-limit dscp cs2 percent 80
  queue-limit dscp cs3 percent 90
  queue-limit dscp cs6 percent 100
  queue-limit dscp cs7 percent 100
  queue-buffers ratio 10
 class AutoQos-4.0-Output-Multimedia-Conf-Queue
  bandwidth remaining percent 10 
  queue-buffers ratio 10
 class AutoQos-4.0-Output-Trans-Data-Queue
  bandwidth remaining percent 10 
  queue-buffers ratio 10
 class AutoQos-4.0-Output-Bulk-Data-Queue
  bandwidth remaining percent 4 
  queue-buffers ratio 10
 class AutoQos-4.0-Output-Scavenger-Queue
  bandwidth remaining percent 1 
  queue-buffers ratio 10
 class AutoQos-4.0-Output-Multimedia-Strm-Queue
  bandwidth remaining percent 10 
  queue-buffers ratio 10
 class class-default
  bandwidth remaining percent 25 
  queue-buffers ratio 25
policy-map AutoQos-4.0-Trust-Dscp-Input-Policy
 class class-default
  set dscp dscp table AutoQos-4.0-Trust-Dscp-Table
policy-map AutoQos-4.0-CiscoPhone-Input-Policy
 class AutoQos-4.0-Voip-Data-CiscoPhone-Class
  set dscp ef
  police cir 128000 bc 8000
   conform-action transmit 
   exceed-action set-dscp-transmit dscp table policed-dscp
 class AutoQos-4.0-Voip-Signal-CiscoPhone-Class
  set dscp cs3
  police cir 32000 bc 8000
   conform-action transmit 
   exceed-action set-dscp-transmit dscp table policed-dscp
 class AutoQos-4.0-Default-Class
  set dscp default

This is the service policy on the interface facing the MPLS circuit:

 service-policy input AutoQos-4.0-Trust-Dscp-Input-Policy
 service-policy output AutoQos-4.0-Output-Policy

On the service provider's router, here's what's configured there:

class-map match-any best-effort
 match ip precedence 0  1 
class-map match-any network-control
 match ip precedence 4  6  7 
class-map match-any expedited-forwarding
 match ip precedence 5 
class-map match-any assured-forwarding
 match ip precedence 2  3 
!
policy-map Template-8
 class expedited-forwarding
  police cir percent 20
   conform-action transmit 
   exceed-action drop 
  priority percent 20
 class network-control
  bandwidth percent 10 
  random-detect
 class best-effort
  bandwidth percent 69 
  random-detect
policy-map QOS
 class class-default
  shape average 10000000
   service-policy Template-8
policy-map rate-limit-20M-out
 class class-default
  shape average 20000000

And finally, on the SP's router facing into their network:

 service-policy output QOS

Joseph W. Doherty · ‎07-29-2019

You might want to double check how you phone tag their packets. However, you're correct an EF will also be considered as an IPPrec 5 and an AF31 will also be consider as an IPPrec 3.

As to having "critical" SMB traffic, remember SMB flows might also be bandwidth hogs. If you place into its own class, be care how how you prioritize it relative to BE traffic and also since SMB flows can vary much in their individual bandwidth demands, ideally that traffic should be treated with FQ.

Yes, class bandwidth percentages will base their calculation on an interface's bandwidth statement (if the classes are tied to a policy directly tied to an interface).

You cannot prioritized traffic for ingress, although you can tag it, for later priority treatment, at ingress.

Doubtful you need a table-map, but I would need to see exactly what you're doing to comment further. BTW, I would go to far in using AutoQoS as an example of "good" QoS.

"Trust" is needed on older switches, with QoS enabled, to preclude the switch from, by default, from clearing the ToS marking. Later switches, by default, now act like routers, i.e. they don't change ToS unless you configure them to do so.

View solution in original post

Joseph W. Doherty · ‎07-24-2019

What is your device and its running IOS version?

Is your WAN topology p2p or multi-point?

What's the WAN's physical hand-off bandwidth?

Generally speaking, you too would want to shape for your available bandwidth, and then prioritize as desired. (For most traffic, a class-default using FQ [if supported on you platform - if not, it can be struggle] works very well. From that you can add classes for SLA traffic, like VoIP, if needed. [NB: personally I'm not a fan of auto-QoS, and random-detect is best avoided unless you're a real QoS guru.)

Regarding the current QoS config, yours appears to be based on Cisco's v4 autoQoS, which can probably be better explained reading Cisco documentation. I don't know how well Cisco's QoS model matches your SLA needs, so cannot comment on whether it's of benefit to you or not.

Your SP's QoS appears to have a policy shaper for both 10 and 20 Mbps. The former using a subordinate policy to treat 3 class of traffic differently. BTW, their mappings for IPPrec don't really agree with RFCs. For example, the AF classes should have an IPPrec of 1..4.

As long as you shape your egress, to your SP not to exceed their CIR, their QoS should matter little except perhaps if yours is a multipoint topology that has congestion from the SP to you.

Nathan Farrar · ‎07-25-2019

Thanks for your reply,

Hand off is Ethernet to the SP's router, then Ethernet to local switch. Topology is multi-point mesh. Physical hand off bandwidth is 100Mbps.

I believe the best way to approach this is to:

1. Shape traffic egress toward the MPLS router to 10Mbps

2. Utilize IPP to tag traffic that the MPLS router will then apply its QoS rules to

I'm thinking the best way is to simplify this by getting rid of AutoQoS entirely and setup some kind of map of DSCP markings to precedence that the MPLS router will watch for.

Still learning here... so what is the best method to incorporate a shaped overall interface but also map DSCP to IPP. I'm sure I'll figure it out once I get through some more documentation but I like to see what people do with experience, often times its more logical than what is outlined in Cisco docs.

Joseph W. Doherty · ‎07-25-2019

Yea, you'll want to limit your output bandwidth to whatever you've obtained from your provider. When it's an actual physical Ethernet bandwidth, you sometimes have the option to run your interface at that bandwidth. The advantage of that, QoS works better with a physical bandwidth than with a shaper.

I believe some Cisco shapers don't account for L2 overhead. If yours does not, you can shape slower to make an allowance. I've found about 15% (slower) often accounts for about the average L2 overhead.

If you device supports fair-queue, that well handles about 90% of QoS needs. (I wonder if the RFC and Cisco 10+ class models are due to not always being able to have FQ.)

My general purpose QoS model is:

policy-map Sample
class realtime
priority percent 30
class foreground
bandwidth remaining percent 81
fair-queue
class background
bandwidth remaining percent 1
fair-queue
class class-default
bandwidth remaining percent 9
fair-queue

For the above, most traffic should go into class class-default. The background class can be used for scavenger type traffic or low-priority bulk traffic. The foreground class can be used for light weight SLA critical traffic. The realtime class would be used for traffic like VoIP bearer.

Having QoS for your egress to your provider's "cloud" deals with possible congestion to it.

For possible egress congestion from provider's "cloud" to your locations, you need to depend on the provider's QoS model that comes closest to your needs. Generally, you use ToS markings to "signal" to the provider how your traffic should be mapped into their QoS.

BTW, your egress QoS treatment, except for ToS markings, doesn't need to align with your provider's QoS model.

IPPrec is the same 1st 3 bits of the ToS, as also used by DSCP. This was intentional with the original DSCP RFC so as to overlap with the prior IPPrec RFC to possibly maintain somewhat similar treatment regardless whether the router is using IPPrec or DSCP prioritization. The one major later exception, the RFC for scavenger marking (CS1) is a low priority than BE, but with IPPrec, the corresponding IPPrec 1 is of a higher priority than IPPrec 0.

Nathan Farrar · ‎07-26-2019

Okay, that makes sense. And working through this is very helpful.

How does the bandwidth command come into play if it were to be configured on the interface facing the MPLS router? Would that then inform the correct values for the percentages?

So, on the actual marking of traffic. I'll need to mark traffic as it enters the host interfaces or trust the marking already applied. I do want voice to be prioritized and I suspect that will already be marked leaving the device. But we will also need to classify SMB traffic to a specific subnet. I would need a policy on all host facing interfaces in the inbound direction, correct? How does this look?:

class-map SMB-TRAFFIC match-any

match access-group 100

policy-map ENT-PM-INPUT

class SMB-TRAFFIC

set

What I am still a bit fuzzy on is where the shaping would come into play and bringing it all together. Does this look correct?

Goal is to prioritize voice traffic from Cisco phones, then SMB traffic to servers across the MPLS. The rest is best-effort. Do I also need to then ma

access-list 100 permit tcp any 10.20.20.0 0.0.0.255 eq 445

class-map SMB-TRAFFIC match-any

match access-group 100

class-map VOICE-DATA match-any

match cos 5

match ip dscp ef

class-map VOICE-SIGNAL match-any

match cos 3

policy-map ENT-PM-INPUT

class SMB-TRAFFIC

set dscp cs3

class-map VOICE-DATA

set dscp cs5

class-map VOICE-SIGNAL

set dscp cs7

policy-map ENT-PM-OUTPUT

class VOICE-DATA

priority percent 30

class SMB-TRAFFIC

bandwidth remaining percent 81

fair-queue

class VOICE-SIGNAL
bandwidth remaining percent 81
fair-queue
class background
bandwidth remaining percent 1
fair-queue
class class-default
bandwidth remaining percent 9
fair-queue

The service provider is using IPP for the MPLS network, so my idea is that I'm matching that internally. From what I gather, the cs5 would match IP Precedence 5 etc.

I would apply the INBOUND policies to host and phone facing ports and the OUTBOUND policy toward the MPLS router. Question is where do I do the interface shaping? Do I have to nest a policy?

Lastly, for the phones I would have to set the interfaces to trust the existing markings, correct?

Thanks!

Joseph W. Doherty · ‎07-26-2019

Ah, much in your last post. I'll try to address all, but if I miss something, please let me know. Also, my answers may not be in the same sequence as your questions.

When it comes to dealing with setting ToS, on Cisco routers it can be done within an ingress policy or an egress policy. (BTW, classification analysis can often be done in either policy too.)

Whether to trust end host markings is up to you. However, with "special" markings, which you don't want abused, you might want to "trust but verify". For example, on edge ports that have VoIP phone, you might verify DSCP EF appears only on UDP traffic and a flow doesn't exceed about 150 Kbps.

Even though your provider is using IPPrec, again, keep in mind DSCP uses the same 1st 3 bits of the ToS, so use DSCP values yourself, as long as they map in the IPPrec values you need. For example, set VoIP bearer traffic to DSCP EF, which will map into IPPrec 5. (Generally, you service provider shouldn't change your ToS markings.) Don't use any IPPrec 6 or 7 markings.

I see you matching CoS. Normally that's a L2 tag found in tagged frames. Generally, avoid it and rely on L3 ToS markings.

An example of an egress policy to your SP, shaping for 10 Mbps:

policy-map sampleParent
class class-default
shape average 10000000
policy-map sampleChild

policy-map SampleChild
rem include classes for how to treat the shaped traffic
rem child class bandwidth percentages should be based on parent policy's shaper bandwidth
rem sum of percentage cannot exceed 100 (for all class if not using "remaining", or 100 less not "remaining" classes)

interface gig1
desc to SP
service-policy output sampleParent

I see you renamed my earlier example policy foregound class to VOICE-SIGNAL. Generally, especially with class FQ, I suggest keeping the class names rather generic, this assuming you might want to direct other traffic into that class. Further, with FQ in class-default, likely voice signally can be left to default into the class-default class. (BTW, what FQ does is hash flows into FQ queues, often flows get their own queue that is equally dequeued with all its peer queues. Normally, a few heavy bandwidth using flows don't get to monopolize bandwidth from light bandwidth using flows.)

Also for you SMB traffic, you could mark it differently, if you chose, but I suggest you just let it also default into the class-default class.

Nathan Farrar · ‎07-28-2019

This is great, I'm building a baseline for further learning!

Here's what I've put together, let me know if you see flaws in the logic please.

I want to have phones take the expected priority. On the host facing interfaces I have an input service policy that looks for the CS5 tags which is expected from a phone voice stream, and then assigns an EF tag. It also looks for the signaling traffic that has a CS3 tag which is expected from voice call setup and assigns the AF31. AF31 = IPP 3 and EF = IPP 5.

I am also looking for traffic that is going to a critical server for SMB traffic. You said that this should probably go to the class-default but I'd like to test performance with it in place. It may actually hinder everything.

I then have an output policy facing the SP that shapes the traffic to 10Mbps and has a child policy that does the prioritization.

Let me know if this seems like a good way of doing things and where I could improve.

Also, a few questions:

- Does the interface "bandwidth" command need to be stated in order for the percentages to function correctly?

- AutoQOS utilized an input and an output service-policy on all interfaces. Is this to priortize data at each interface instead? Should I do the same?

- I put a table-map in place since it is what I saw in AutoQoS. I'm using that as a reference tool, maybe not needed?

- And lastly, should I be using "trust" commands to trust existing markings or is what I have sufficient?

Really appreciate the assistance with this, it is truly clearing things up and I'm sure this thread will be helpful for others.

Best

Joseph W. Doherty · ‎07-29-2019

You might want to double check how you phone tag their packets. However, you're correct an EF will also be considered as an IPPrec 5 and an AF31 will also be consider as an IPPrec 3.

As to having "critical" SMB traffic, remember SMB flows might also be bandwidth hogs. If you place into its own class, be care how how you prioritize it relative to BE traffic and also since SMB flows can vary much in their individual bandwidth demands, ideally that traffic should be treated with FQ.

Yes, class bandwidth percentages will base their calculation on an interface's bandwidth statement (if the classes are tied to a policy directly tied to an interface).

You cannot prioritized traffic for ingress, although you can tag it, for later priority treatment, at ingress.

Doubtful you need a table-map, but I would need to see exactly what you're doing to comment further. BTW, I would go to far in using AutoQoS as an example of "good" QoS.

"Trust" is needed on older switches, with QoS enabled, to preclude the switch from, by default, from clearing the ToS marking. Later switches, by default, now act like routers, i.e. they don't change ToS unless you configure them to do so.

Nathan Farrar · ‎07-29-2019

Thanks a lot for your input on this. I'll be putting this into production tomorrow.

Best

Nathan Farrar · ‎07-30-2019

Another question:

This is a 3850 platform - no fair-queue option exists. I've implemented the config but have to wait for users to get in to see if things are working. Can you peek at this to see if it doesn't have any major errors? Anything else I can add to ensure queueing is occurring correctly? I really want to make sure traffic is shaped an not just dropped. The client's complaint is mainly with file shares (SMB) and Internet traffic. But there is only so much that can be done.

class-map match-any realtime
 match ip dscp ef
class-map match-any foreground
 match ip dscp af31
class-map match-any background
 match ip dscp af11 af12 af13
 
class-map Cisco-VOIP-Data
 match cos 5
class-map Cisco-VOIP-Signal
 match cos 3

policy-map ENT-INPUT-PMAP
 class Cisco-VOIP-Data
  set ip dscp ef
  police cir 128000 bc 8000
   conform-action transmit 
 class Cisco-VOIP-Signal
  set ip dscp af31
  police cir 32000 bc 8000
   conform-action transmit 

policy-map ENT-OUTPUT-PMAP
 class class-default
  shape average 10000000
   service-policy ENT-OUTPUT-TEMPLATE

policy-map ENT-OUTPUT-TEMPLATE
 class realtime
  priority percent 30
 class foreground
  bandwidth remaining percent 81
 class background
  bandwidth remaining percent 1
 class class-default
  bandwidth remaining percent 9

Joseph W. Doherty · ‎07-30-2019

That's looks like a fairly good first policy to try. You might try changing your background class to match just CS1.

You should also decide what to do with overrate VoIP traffic. Choices include dropping or marking down. What you have won't do anything to your overrate traffic beyond providing stats that it's happening.

Since you don't have the FQ feature, be prepared to create additional classes for your BE traffic.

You might add to you ingress policy a policer that will mark non-VoIP, high bandwidth usage, with a marking to separate it from other BE traffic.

on

Again, without FQ, you may find you need to adjust you egress policy quite a bit. In the meantime, even if it doesn't immediately improve the user experience, with your shaper and classes, you should have some better stats on congestion.

Nathan Farrar · ‎07-31-2019

Okay, I'll have to look into adjustments then. What would you recommend doing to traffic that has exceeded the input policing? Markdown to af31 or similar?

Since I have this running, I'm thinking I could use some help interpreting the results. Can you tell me what you see here? I am seeing 'packet' and expecting that this is the number of packets that have been matched by the class within the policy.. but I'm not seeing what I'd expect. I'm seeing some that have packets matched but zero 'conformed' and some that have values for both. Trying to understand that discrepancy.

Also, for my output policy that is doing the shaping, I'm seeing zero for just about everything except total drops. What do you think?

Here is an example interface with inputs:

 GigabitEthernet1/0/10

  Service-policy input: ENT-INPUT-PMAP

    Class-map: Cisco-VOIP-Data (match-any)
      312994 packets
      Match: cos  5
        0 packets, 0 bytes
        5 minute rate 0 bps
      QoS Set
        ip dscp ef
      police:
          cir 128000 bps, bc 8000 bytes
        conformed 0 bytes; actions:
          transmit
        exceeded 0 bytes; actions:
          drop
        conformed 0000 bps, exceeded 0000 bps

    Class-map: Cisco-VOIP-Signal (match-any)
      193191 packets
      Match: cos  3
        0 packets, 0 bytes
        5 minute rate 0 bps
      QoS Set
        ip dscp af31
      police:
          cir 32000 bps, bc 8000 bytes
        conformed 1282434 bytes; actions:
          transmit
        exceeded 0 bytes; actions:
          drop
        conformed 0000 bps, exceeded 0000 bps

    Class-map: SMB-TRAFFIC (match-any)
      17869 packets
      Match: access-group name SMB-MPLS
        0 packets, 0 bytes
        5 minute rate 0 bps
      QoS Set
        ip dscp af31

    Class-map: class-default (match-any)
      3669649 packets
      Match: any

And here is the output toward the SP router, again not seeing anything happening here.

 GigabitEthernet2/0/1

  Service-policy output: ENT-OUTPUT-PMAP

    Class-map: class-default (match-any)
      0 packets
      Match: any
      Queueing

      (total drops) 2079529
      (bytes output) 768317836
      shape (average) cir 10000000, bc 40000, be 40000
      target shape rate 10000000

      Service-policy : ENT-OUTPUT-TEMPLATE

        queue stats for all priority classes:
          Queueing

          (total drops) 0
          (bytes output) 19482271

        Class-map: realtime (match-any)
          0 packets
          Match: ip dscp ef (46)
            0 packets, 0 bytes
            5 minute rate 0 bps
          Priority: 30% (3000 kbps), burst bytes 75000,


        Class-map: foreground (match-any)
          0 packets
          Match: ip dscp af31 (26)
            0 packets, 0 bytes
            5 minute rate 0 bps
          Queueing

          (total drops) 1518
          (bytes output) 37137977
          bandwidth remaining 81%

        Class-map: background (match-any)
          0 packets
          Match: ip dscp cs1 (8)
            0 packets, 0 bytes
            5 minute rate 0 bps
          Queueing

          (total drops) 0
          (bytes output) 78
          bandwidth remaining 1%

        Class-map: class-default (match-any)
          0 packets
          Match: any
          Queueing

          (total drops) 2078011
          (bytes output) 711697510
          bandwidth remaining 9%

Joseph W. Doherty · ‎07-31-2019

For your VoIP policers, you could drop or mark down, choice is your. With that traffic you shouldn't go overrate, so the question become, why? If someone is abusing the marking, you would drop to stop them, or mark down to permit them but the latter at a lower priority, such as BE.

Your ingress policy is not "seeing" VoIP CoS tags. You would need to confirm how your VoIP phones are configured. Many/most can be configured to set the ToS field. If they are doing that, and only that, you'll not get CoS matches.

As to why you have drops w/o matches, we need to dig into your IOS documentation (and supplemental Cisco documentation). It could be the switch doesn't register hardware/ASIC drops (a possible feature or bug - not unusual on "low end" Cisco switches - is the switch under Cisco maintenance?).