Re: WAN link output drops class-default drops

colossus1611 · ‎08-31-2020

Hello,

I have got a client site which is complaining of connectivity issues at times and while their utilisation is 60% or less most of the time on the WAN link, I am still seeing output drops on it and the policy-map output confirms that this is for default class-map. What can I do to fix this?

GigabitEthernet0/0/0 is up, line protocol is up
Hardware is ISR4351-3x1GE
Description: (MW) (200M)
Internet address is 10.254.4.62/29
MTU 1578 bytes, BW 200000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 7/255, rxload 9/255
Encapsulation ARPA, loopback not set
Keepalive not supported
Full Duplex, 1000Mbps, link type is auto, media type is RJ45
output flow-control is off, input flow-control is off
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:00, output hang never
Last clearing of "show interface" counters never
Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 3439395
Queueing strategy: Class-based queueing
Output queue: 0/40 (size/max)
30 second input rate 7462000 bits/sec, 1570 packets/sec
30 second output rate 5885000 bits/sec, 1272 packets/sec
5763681543 packets input, 4536096085286 bytes, 0 no buffer
Received 8 broadcasts (0 IP multicasts)
0 runts, 0 giants, 0 throttles
1 input errors, 1 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 37417882 multicast, 0 pause input
3945427856 packets output, 1419410120302 bytes, 0 underruns
0 output errors, 0 collisions, 1 interface resets
0 unknown protocol drops
0 babbles, 0 late collision, 0 deferred
4 lost carrier, 0 no carrier, 0 pause output
0 output buffer failures, 0 output buffers swapped out

GigabitEthernet0/0/0

Service-policy output: L2-SHAPING

Class-map: class-default (match-any)
3948959413 packets, 1423766322820 bytes
30 second offered rate 6972000 bps, drop rate 14000 bps
Match: any
Queueing
queue limit 768 packets
(queue depth/total drops/no-buffer drops) 0/3439465/0
(pkts output/bytes output) 3940704125/1418911301780
shape (average) cir 194000000, bc 1940000, be 0
target shape rate 194000000

Service-policy : L1-QUEUING

queue stats for all priority classes:
Queueing
priority level 1
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 27451150/6055021456

Class-map: class-default (match-any)
3683591534 packets, 1359287605115 bytes
30 second offered rate 6932000 bps, drop rate 9000 bps
Match: any

queue limit 768 packets
(queue depth/total drops/no-buffer drops) 0/3439465/0
(pkts output/bytes output) 3679939925/1354906022986
Exp-weight-constant: 9 (1/512)
Mean queue depth: 0 packets
class Transmitted Random drop Tail drop Minimum Maximum Mark
pkts/bytes pkts/bytes pkts/bytes thresh thresh prob

0 3679939925/1354906022986 291901/356055094 3125250/3909507433 192 384 1/10
1 0/0 0/0 0/0 216 384 1/10
2 0/0 0/0 0/0 240 384 1/10
3 0/0 0/0 0/0 264 384 1/10
4 0/0 0/0 0/0 288 384 1/10
5 0/0 0/0 0/0 312 384 1/10
6 0/0 0/0 0/0 336 384 1/10
7 0/0 0/0 0/0 360 384 1/10

Georg Pauwen · ‎09-01-2020

Hello,

increase the output queue on the interface:

4331#conf t

4331(config)#interface GigabitEthernet0/0/0

4331(config-if)#hold-queue 375 out

colossus1611 · ‎09-01-2020

Hi George,

Thanks for the suggestion. Few things:

- What's the default value for this hold-queue?

- Will this help improve performance?

- Any changes that I can make to QoS policy for this?

Georg Pauwen · ‎09-01-2020

Hello,

the default is 40.

You already have a QoS policy in place, so adding another one doesn't make sense. The output drop are likely due to microbursts. Increasing the output hold queue might or might not reduce the drops. You can start with doubling the amount (to 80) and check if the drops decrease. Impact should be none.

colossus1611 · ‎09-01-2020

Thanks George. I will give that a try. The end objective of course is to improve user experience with reduced latency and drop rates.

Georg Pauwen · ‎09-01-2020

You can also tune the buffers. Post the output of:

show buffers

colossus1611 · ‎09-01-2020

Hi George,

Here's the output for show buffers:

Buffer elements:
865 in free list
358546412 hits, 0 misses, 1019 created

Public buffer pools:
Small buffers, 104 bytes (total 1200, permanent 1200):
1197 in free list (200 min, 2500 max allowed)
250254383 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
Middle buffers, 600 bytes (total 900, permanent 900):
899 in free list (100 min, 2000 max allowed)
192870836 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
Big buffers, 1536 bytes (total 900, permanent 900, peak 922 @ 22w6d):
900 in free list (50 min, 1800 max allowed)
44388094 hits, 28 misses, 22 trims, 22 created
0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 100, permanent 100, peak 112 @ 22w6d):
100 in free list (0 min, 300 max allowed)
35613457 hits, 364 misses, 12 trims, 12 created
364 failures (0 no memory)
Large buffers, 5024 bytes (total 100, permanent 100, peak 111 @ 22w6d):
100 in free list (0 min, 300 max allowed)
479751 hits, 184 misses, 11 trims, 11 created
184 failures (0 no memory)
VeryLarge buffers, 8264 bytes (total 100, permanent 100, peak 111 @ 22w6d):
100 in free list (0 min, 300 max allowed)
3334116 hits, 74 misses, 11 trims, 11 created
74 failures (0 no memory)
Huge buffers, 18024 bytes (total 20, permanent 20, peak 32 @ 22w6d):
20 in free list (0 min, 33 max allowed)
10319929 hits, 43 misses, 12 trims, 12 created
43 failures (0 no memory)

Interface buffer pools:
CF Small buffers, 104 bytes (total 101, permanent 100, peak 101 @ 22w6d):
101 in free list (100 min, 200 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
Generic ED Pool buffers, 512 bytes (total 101, permanent 100, peak 101 @ 22w6d):
101 in free list (100 min, 100 max allowed)
0 hits, 0 misses
CF Middle buffers, 600 bytes (total 101, permanent 100, peak 101 @ 22w6d):
101 in free list (100 min, 200 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
Syslog ED Pool buffers, 600 bytes (total 1057, permanent 1056, peak 1057 @ 22w6d):
1025 in free list (1056 min, 1056 max allowed)
7497 hits, 0 misses
Cellular NIM IPC buffers, 800 bytes (total 1025, permanent 1025):
0 in free list (0 min, 1025 max allowed)
1025 hits, 0 misses
1025 max cache size, 1025 in cache
243215 hits in cache, 0 misses in cache
ATM0/1/0 buffers, 1490 bytes (total 33, permanent 32, peak 33 @ 22w6d):
33 in free list (32 min, 96 max allowed)
0 hits, 0 fallbacks, 1433 trims, 1434 created
0 failures (0 no memory)
EOBC0 buffers, 1524 bytes (total 256, permanent 256):
256 in free list (0 min, 256 max allowed)
0 hits, 0 fallbacks
CF Big buffers, 1536 bytes (total 26, permanent 25, peak 26 @ 22w6d):
26 in free list (25 min, 50 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
IPC buffers, 4096 bytes (total 378, permanent 378):
377 in free list (126 min, 1260 max allowed)
1 hits, 0 fallbacks, 0 trims, 0 created
0 failures (0 no memory)
CF VeryBig buffers, 4520 bytes (total 3, permanent 2, peak 3 @ 22w6d):
3 in free list (2 min, 4 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
CF Large buffers, 5024 bytes (total 2, permanent 1, peak 2 @ 22w6d):
2 in free list (1 min, 2 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
IPC Medium buffers, 16384 bytes (total 2, permanent 2):
2 in free list (1 min, 8 max allowed)
0 hits, 0 fallbacks, 0 trims, 0 created
0 failures (0 no memory)
Private Huge IPC buffers, 18024 bytes (total 1, permanent 0, peak 1 @ 22w6d):
1 in free list (0 min, 4 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
Private Huge buffers, 65280 bytes (total 1, permanent 0, peak 1 @ 22w6d):
1 in free list (0 min, 4 max allowed)
0 hits, 0 misses, 1433 trims, 1434 created
0 failures (0 no memory)
IPC Large buffers, 65535 bytes (total 17, permanent 16, peak 17 @ 22w6d):
17 in free list (16 min, 16 max allowed)
0 hits, 0 misses, 231665 trims, 231666 created
0 failures (0 no memory)

Header pools:
Header buffers, 0 bytes (total 266, permanent 256, peak 266 @ 22w6d):
10 in free list (10 min, 512 max allowed)
253 hits, 3 misses, 0 trims, 10 created
0 failures (0 no memory)
256 max cache size, 256 in cache
147149103 hits in cache, 0 misses in cache

Particle Clones:
1024 clones, 0 hits, 0 misses

Public particle pools:
F/S buffers, 256 bytes (total 384, permanent 384):
128 in free list (128 min, 1024 max allowed)
256 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
256 max cache size, 256 in cache
0 hits in cache, 0 misses in cache
Normal buffers, 512 bytes (total 512, permanent 512):
384 in free list (128 min, 1024 max allowed)
128 hits, 0 misses, 0 trims, 0 created
0 failures (0 no memory)
128 max cache size, 128 in cache
0 hits in cache, 0 misses in cache

Private particle pools:
lsmpi_rx buffers, 416 bytes (total 8194, permanent 8194):
0 in free list (0 min, 8194 max allowed)
8194 hits, 0 misses
8194 max cache size, 0 in cache
361499836 hits in cache, 0 misses in cache
lsmpi_tx buffers, 416 bytes (total 4098, permanent 4098):
0 in free list (0 min, 4098 max allowed)
4098 hits, 0 misses
4098 max cache size, 4097 in cache
208311340 hits in cache, 0 misses in cache

Georg Pauwen · ‎09-02-2020

Hello,

as stated by Joseph, you can configure auto buffer tuning:

Router(config)#buffers tune automatic

Joseph W. Doherty · ‎09-01-2020

60%, or less, utilization can mean very little when it comes to drops. A common reason why is because of microbursts (as also mentioned by Georg). If you're unfamiliar with the term, you might research it.

Additionally, I see you're using WRED. Generally, I advise anyone, not a QoS expert, to not use WRED. If you wonder why, research all the follow-on variants of RED, that "fix/improve" RED. Take particular note of the number of variants that "fix/improve" RED; then consider why that might be so.

So, I would advise removing WRED and replacing it with fair-queue. Let FQ run a while, see how it does. It might need some tuning, but we want some usage stats before we attempt that.

Hopefully, you may find FQ decreases latency for many of your client's flow, along with possibly decreasing the overall drop rate. Even if the overall drop rate isn't decreased, the drops should then more be "targeted" against "bandwidth hogs", unlike your current WRED. (NB: On a few [one?] platforms, Cisco has it's own RED variant, FRED; which also tries to target "bandwidth hogs".)

BTW, regarding some of Georg's comments and suggestions:

"the default is 40."

This is usually true, and appears true in this case because interface stats show "Output queue: 0/40 (size/max)" but I've seen a few Cisco devices that don't default to 40.

That aside, Georg suggests increasing it. I believe (?) it's overridden by your applied service policy. Also, increasing queue depths can lead to increased latency, something you mention you wish to minimize, and in some cases, too large a queue depth can actually increase drop rates and cause other issues. (For TCP, I believe you don't want to exceed about half the BDP. Which would be handy to know for any additional queue depth tuning.)

As another note, Georg mentions checking buffer stats, which is fine, but if they need tuning, your IOS might have a buffer auto tuning option.

colossus1611 · ‎09-01-2020

Thanks Joseph, for the detailed answer.

I thought WRED and WFQ are separate and in fact can both be applied at the same time, and most importantly the queuing (FQ/WFQ) does not get applied until there is congestion - which in my case here doesn't exist (no congestion).

Borrowing on from another thread:

- Queueing is used when there is congestion on an interface. This is usually detected through that the Transmit Ring (TX-Ring) is full. This means that the interface is busy sending packets.

- Weighted Random Early Detection (WRED) is a congestion avoidance mechanism. WRED measures the size of the queues depending on the Precedence value and starts dropping packets when the queue is between the minimum threshold and the maximum threshold. Configuration will decide that 1 in every N packets are dropped. WRED helps to prevent TCP synchronization and TCP starvation.

So I also note from above that WRED is congestion avoidance and only helps with TCP traffic, not UDP traffic say Voice.

Joseph W. Doherty · ‎09-02-2020

"I thought WRED and WFQ are separate and in fact can both be applied at the same time . . ."

Yes, that's correct, but mixing them is generally not a good idea, again, unless you're a real QoS expert. Also, again, I strongly recommend non-QoS experts don't use WRED.

". . . which in my case here doesn't exist (no congestion)."

Your posted stats say otherwise! Why do you think you're seeing drops?

Basically, there's congestion whenever a packet is waiting to be transmitted. (BTW, congestion isn't always "bad".) Drops happen when queues overflow or, in the case of RED, when an average queue depth/length exceeds some value. (Interestingly, with RED, it can drop packets when there are no packets in a queue, and conversely, drop packets, when physical queue overflows yet RED doesn't "see" any congestion.)

"Queueing is used when there is congestion on an interface. This is usually detected through that the Transmit Ring (TX-Ring) is full. This means that the interface is busy sending packets."

BTW, the tx-ring in an interface's hardware transmission FIFO-only queue. I.e. packets queue in it first. When it "overflows", packets are "software" queued. It's the latter where we have options to "manage" the queued packets. For optimal QoS, you may need to decrease the size of the tx-ring, to avoid typical global FIFO queue issues (e.g. bulk traffic being ahead of VoIP packets).

"Weighted Random Early Detection (WRED) is a congestion avoidance mechanism. WRED measures the size of the queues depending on the Precedence value and starts dropping packets when the queue is between the minimum threshold and the maximum threshold. Configuration will decide that 1 in every N packets are dropped. WRED helps to prevent TCP synchronization and TCP starvation."

The prior, I believe, isn't completely correct. With ordinary WRED, there's only one queue (average) depth/length being measured. What IPPrec (or optionally DSCP) does is provide different threshold and drop percentage values for ToS markings. (BTW, your stats only show IPPrec zero traffic, so you're effectively doing RED, not WRED.)

Yup, one purpose is can be used for is to avoid global TCP synchronization, although Cisco's defaults all use the same max threshold value. I.e. if this happens, it can defeat the goal. Also, at least on Cisco, you can still have physical queue limit drops too.

[edit - addition =>] Not sure what's intended with "TCP starvation". The typical "TCP starvation" issue, that comes to my mind, is TCP vs. UDP (because TCP should slow when drops are detected but UDP often does not). [<= edit - addition]

Again, getting RED to work optimally, is often, surprisingly, very difficult.

Dr. Floyd was also trying to find something, "lightweight" (i.e. minimal router resource consumption), that worked better than FIFO. Of course, not too long after publishing how RED should work, she published a RED revision, that "corrected" some issues. As noted earlier, there's been a lot of other persons suggesting revisions to make RED work "better", or perhaps I should say, to work the ideal way it was intended to all along. Also BTW, Cisco default's parameters, I believe, are effectively their own "better" approach.

Lastly, when RED was published, TCP then, wasn't the same TCP often used today. Back then, we didn't have TCP stacks that further work with RTT to detect congestion. I.e. back then, TCP relied on dropped packets, alone, to detect congestion.

"So I also note from above that WRED is congestion avoidance and only helps with TCP traffic, not UDP traffic say Voice."

Again, today's TCP traffic isn't as suitable for RED's approach as when RED was first published. Further, router hardware has improved such that we can do things, today, that would have been too resource intensive then. Further, although TCP is the "poster child" for something like RED, some UDP traffic will slow when it too detect drops. This isn't due to UDP, itself, but due to the application using UDP having some form of its own flow control, which may detect and slow when drops are detected. Even so, once again, I would leave RED to QoS experts.