Solved: Re: Accounting for Packet Drops, QoS

Verbatim · ‎06-08-2023

Hello,

Does the total drops count from the policy map output data for an interface include only drops caused by QoS?

What's prompting this question is the attached chart showing packet drops on the interface. The goal is to determine whether QoS is solely responsible for the packet drops on the interface.

Programmatically, is the dropped output count from the show interfaces command the same data structure as the total drop count in the policy map output show policy-map interface tenGigabitEthernet 0/0/0? Or in other words, is it ever possible to have a situation where output drops occur that are not counted in the QoS policy map output counter but are counted in the show interfaces counter?

Appreciate your help.

Joseph W. Doherty · ‎06-08-2023

"QoS allows traffic to be managed by queues . . ."

QoS management can be a bit more complex than per queue.

". . . and prioritization takes place?"

Depends on your QoS policy.

"So in a situation where the pipe is backing up, and drops are taking place, is it simply the fact that the higher priority traffic is processed first that results in failure to process the lesser traffic (drops)?

Possibly, possibly not. Congestion is due to trying to send more traffic than egress can physically send, but how that impacts traffic, vis-à-vis prioritization, depends on the policy being used and the nature of the traffic kind(s) when there's congestion.

"Yes, that is exactly what we are wondering. Is there any way to reduce the average time period in order to give a more accurate utilization reading? I believe the chart was generated from data taken from the router."

Depends on how stats are being collected. Often, NMS can set a polling interval, but reducing that, although it may provide for more granular analysis often has adverse side effects. Again, as drops happen down in the millisecond time range, difficult to "see" that level of detail. (It's so problematic, for a while Cisco even incorporated Corvil Bandwidth Estimation Technology in some versions of IOS for analysis of traffic bandwidth need to meets service goals.)

"It is possible that more packets are being dropped than is optimal."

That's very often the case. However, you often don't really need to graph out thoughput vs. drops, just looking at the QoS stats provides much information. If your drops are due to transient bursts (i.e. microbursts), just increasing some queue limits will often dramatically reduce drop stats.

If you're using AutoQoS, personally I believe it's often only suitable for those who want to insure prioritization for real-time traffic like VolP or video conferencing, what it does beyond that, IMO, assumes too much about QoS needs and/or creates an overly complex QoS policy.

View solution in original post

Joseph W. Doherty · ‎06-08-2023

"Or in other words, is it ever possible to have a situation where output drops occur that are not counted in the QoS policy map output counter but are counted in the show interfaces counter?"

Possible? I believe so, although if they don't match, it also might be due to a bug (software defect) or "feature" (design defect), and might vary per platform and/or IOS. (NB: over the years, seen quite different drop counts on interface drops and detailed QoS drops - generally on switches.)

For instance, most QoS drops, or any kind egress drops, are due to hitting a logical limit, e.g. maximum number of packets allowed to be queued, but you can also have drops due to resource limits, like physically running short of buffer space or not being able to allocate a buffer as quickly as needed.

Are you "seeing" some large differences between drops in a policy-map's stats vs. interface drop stats?

BTW, QoS doesn't really cause drops, any more than a FIFO queue causes drops. Generally, drops are due to congestion.

Looking at your posted graph, if you're wondering how you can have drops when your interface max utilization never exceeds about 75%, that's usually because even "max" utilization is averaged over some time period, often in minutes, vs. drops which happen in milliseconds. That said, those that don't truly understand QoS, especially QoS drop management aspects, might easily inadvertently drop more packets than whatever might actually be optimal for one's QoS goals.

Verbatim · ‎06-08-2023

No, I'm not seeing differences between the counters. The counters have been cleared, and since that point, the counts have remained identical.

The goal being to provide an explanation for the drops; if they differ, it might lead one to believe that QoS is not entirely responsible for the results.

QoS allows traffic to be managed by queues, and prioritization takes place? So in a situation where the pipe is backing up, and drops are taking place, is it simply the fact that the higher priority traffic is processed first that results in failure to process the lesser traffic (drops)?

Yes, that is exactly what we are wondering. Is there any way to reduce the average time period in order to give a more accurate utilization reading? I believe the chart was generated from data taken from the router.

It is possible that more packets are being dropped than is optimal.

Verbatim · ‎06-08-2023

It seems the poll time for this interface is 30 seconds, the minimum. SNMP traps might be used to go lower (but software may only update interfaces every 10 sec anyway).

Joseph W. Doherty · ‎06-08-2023

"QoS allows traffic to be managed by queues . . ."

QoS management can be a bit more complex than per queue.

". . . and prioritization takes place?"

Depends on your QoS policy.

"So in a situation where the pipe is backing up, and drops are taking place, is it simply the fact that the higher priority traffic is processed first that results in failure to process the lesser traffic (drops)?

Possibly, possibly not. Congestion is due to trying to send more traffic than egress can physically send, but how that impacts traffic, vis-à-vis prioritization, depends on the policy being used and the nature of the traffic kind(s) when there's congestion.

"Yes, that is exactly what we are wondering. Is there any way to reduce the average time period in order to give a more accurate utilization reading? I believe the chart was generated from data taken from the router."

Depends on how stats are being collected. Often, NMS can set a polling interval, but reducing that, although it may provide for more granular analysis often has adverse side effects. Again, as drops happen down in the millisecond time range, difficult to "see" that level of detail. (It's so problematic, for a while Cisco even incorporated Corvil Bandwidth Estimation Technology in some versions of IOS for analysis of traffic bandwidth need to meets service goals.)

"It is possible that more packets are being dropped than is optimal."

That's very often the case. However, you often don't really need to graph out thoughput vs. drops, just looking at the QoS stats provides much information. If your drops are due to transient bursts (i.e. microbursts), just increasing some queue limits will often dramatically reduce drop stats.

If you're using AutoQoS, personally I believe it's often only suitable for those who want to insure prioritization for real-time traffic like VolP or video conferencing, what it does beyond that, IMO, assumes too much about QoS needs and/or creates an overly complex QoS policy.

MHM Cisco World · ‎06-08-2023

I think it totally relate to QoS
the router use QoS to policing traffic,
how I know
you can see the blue graph and GLOD (discard)
both relate
the peak of graph is identical appear in same time
this QoS tech used when the traffic (BLUE) pass specific point the packet start to drop (GOLD).

Joseph W. Doherty · ‎06-08-2023

"I think it totally relate to QoS"

Yes and no.

"how I know
you can see the blue graph and GLOD (discard)
both relate
the peak of graph is identical appear in same time
this QoS tech used when the traffic (BLUE) pass specific point the packet start to drop (GOLD)."

Reading the above, you too (inadvertently?) are implying QoS tech is the cause of the drops.

100% agree, drops correlate to traffic load, but what would you expect if you had an "ordinary" (the usual default) FIFO queue? Might it too show drops during bursts of traffic? Is a FIFO queue a QoS technology?

Certainly a suboptimal QoS configuration, much like a suboptimal FIFO configuration, might cause higher drop rates, but both due to their configurations, not the technologies, being a problem, in and by themselves.

Also, an often overlooked aspect of QoS policies is drop management and/or queuing delay. We may want to intentionally avoid queuing latency and that may cause more drops. I.e. an optimal drop rate isn't necessarily the lowest drop rate, it's optimal when it accomplishes our QoS goals, which might be something as "simple" as "goodput" for TCP bulk traffic flows.

MHM Cisco World · ‎06-08-2023

You meaning depth of FIFO queue ? If yes then return to his graph the interface utilization not hit 100% ot around 50% and packet drop.

So it is QoS policing force the router not to push data more than specific value.

this what happened here from my view

Joseph W. Doherty · ‎06-08-2023

"this what happened here from my view"

Since the OP's later posted policy doesn't use a policer, what's your view now?

Verbatim · ‎06-08-2023

show policy-map interface tenGigabitEthernet 0/0/0
 TenGigabitEthernet0/0/0 

  Service-policy output: QOS-WAN-ASR-10G-3G

    Class-map: class-default (match-any)  
      1280823619 packets, 1253083712903 bytes
      30 second offered rate 731511000 bps, drop rate 0000 bps
      Match: any 
      Queueing
      queue limit 4194 packets
      (queue depth/total drops/no-buffer drops) 0/22746/0
      (pkts output/bytes output) 1280691034/1253040396209
      shape (average) cir 3000000000, bc 12000000, be 12000000
      target shape rate 3000000000

      Service-policy : QOS-WAN-ASR-10G

        queue stats for all priority classes:
          Queueing
          queue limit 512 packets
          (queue depth/total drops/no-buffer drops) 0/0/0
          (pkts output/bytes output) 246585982/59479122185

        Class-map: QOS-PRIORITY (match-any)  
          246585982 packets, 59479122185 bytes
          30 second offered rate 7127000 bps, drop rate 0000 bps
          Match:  dscp cs5 (40) ef (46)
          Priority: 20% (600000 kbps), burst bytes 15000000, b/w exceed drops: 0
          

        Class-map: QOS-REAL-TIME (match-any)  
          146549 packets, 112182862 bytes
          30 second offered rate 0000 bps, drop rate 0000 bps
          Match:  dscp cs4 (32) af41 (34)
          Queueing
          queue limit 125 packets
          (queue depth/total drops/no-buffer drops) 0/0/0
          (pkts output/bytes output) 146549/112182862
          bandwidth remaining 20%
          

        Class-map: QOS-SIGNALING (match-any)  
          7833236 packets, 2667306311 bytes
          30 second offered rate 704000 bps, drop rate 0000 bps
          Match:  dscp cs3 (24) af31 (26) cs6 (48) cs7 (56)
          Queueing
          queue limit 125 packets
          (queue depth/total drops/no-buffer drops) 1/0/0
          (pkts output/bytes output) 7727569/2658903863
          bandwidth remaining 20%
          

        Class-map: QOS-TIME-SENSITIVE (match-any)  
          115389 packets, 52613543 bytes
          30 second offered rate 19000 bps, drop rate 0000 bps
          Match:  dscp cs2 (16) af21 (18)
          Queueing
          queue limit 125 packets
          (queue depth/total drops/no-buffer drops) 0/0/0
          (pkts output/bytes output) 115389/52613543
          bandwidth remaining 10%
          

        Class-map: QOS-BULK (match-any)  
          140125580 packets, 200300811975 bytes
          30 second offered rate 124953000 bps, drop rate 0000 bps
          Match:  dscp 7  af11 (10)
          Queueing
          queue limit 125 packets
          (queue depth/total drops/no-buffer drops) 0/22746/0
          (pkts output/bytes output) 140102835/200267445037
          bandwidth remaining 1%
          

        Class-map: class-default (match-any)  
          886017070 packets, 990471861604 bytes
          30 second offered rate 598693000 bps, drop rate 0000 bps
          Match: any 
          Queueing
          queue limit 40000 packets
          (queue depth/total drops/no-buffer drops) 0/0/0
          (pkts output/bytes output) 886012710/990470128719
          bandwidth remaining 49%

Verbatim · ‎06-08-2023

The queue limit of 125 packets in the BULK class is most likely the reason packets are dropping; the queue is likely maxed out.

Increasing the size of that queue might address the drops, but that might not actually be ideal. It is a complicated issue with a lot of trade offs. Thank you Joseph Doherty, for your post here and in several other places, which have been informative.

Joseph W. Doherty · ‎06-08-2023

"The queue limit of 125 packets in the BULK class is most likely the reason packets are dropping; the queue is likely maxed out."

Well, I would say that's what IS happening, at least during traffic bursts.

"Increasing the size of that queue might address the drops, but that might not actually be ideal."

Certainly true. However, unless your QoS policy's purpose is to "penalize" that traffic class, again, 125 packets on a 3 Gbps circuit, is very small. Keep in mind, increasing the queue limit will likely reduce drops and raise your overall average utilization (i.e. taking advantage of otherwise unused bandwidth), but it should not be adverse to your other traffic because of the minimal bandwidth allocation (again, it takes advantage of otherwise unused bandwidth).

I see your reply is before my prior reply, so if you're concerned about increasing your QOS-BULK queue-limit, you're probably terrorfied of the policy I suggested, but it will likely do "better" for what you're likely trying to accomplish.

Joseph W. Doherty · ‎06-08-2023

Ah, from your stats, all the drops are in your QOS-BULK class, which isn't surprising as it's only guaranteed 1% of the bandwidth and a queue limit of 125 packets, the latter is likely very "shallow" for a 3Gbps WAN connection.

There's two (usual) schools of thought about queue limits. One school proposes, queue resources should correspond to bandwidth allocations, to support those allocations. The other school is as you decrease bandwidth allocations, you need to allocate more queuing resources, as queuing is more likely. Personally, I allocate queuing resources to accomplish my QoS goals.

If you want to decrease drops in the QOS-BULK class, increase the class's queue limit, perhaps trying about 1,024 for a start. Then monitor results for a few days.

Personally, I would suggest a QoS policy-map like:

BTW - your shaper might not account for L2 overhead - if not, reduce by about 15%

policy-map QOS-WAN-ASR !child policy
class QOS-PRIORITY ! match ipprec 5
priority percent 30
class QOS-BULK !match dscp 7, af11
bandwidth remaining percent 1
fair-queue
class QOS-foreground ! for/combines/replaces qos-real-time, qos-signaling, qos-time-sensitive
!match ipprec 2, 3, 4, 6, 7
bandwidth remaining percent 81
fair-queue
class class-default
bandwidth remaining percent 9
fair-queue

Depending what ASR generates for queue limits, those might need adjusting.