Re: Output Drops in Cisco 2960XR QOS

Abhisheka15 · ‎01-30-2023

There are drops in Queue 3 threshold 3 on multiple ports connected to WAP. Not able to reduce the drops. As 2960XR has 4MB buffer size. Th eport is connected top WAP & user count is increasing in office so it may cause bursty traffic on port.

!

interface GigabitEthernet4/0/48
switchport
srr-queue bandwidth share 1 50 45 5
srr-queue bandwidth shape 10 0 0 0
priority-queue out
mls qos trust dscp
spanning-tree portfast edge

!

2960-XR#sh mls qos interface Gig 4/0/48 statistics
GigabitEthernet4/0/48 (All statistics are in packets)

dscp: incoming
-------------------------------

0 - 4 : 28038660 0 0 0 1099
5 - 9 : 1 0 0 624 0
10 - 14 : 0 0 0 0 0
15 - 19 : 0 0 0 10571062 0
20 - 24 : 182822 0 0 0 2
25 - 29 : 0 2792 0 0 0
30 - 34 : 0 0 147 0 58417
35 - 39 : 0 0 0 0 0
40 - 44 : 606 0 0 0 0
45 - 49 : 0 356257 0 721459 0
50 - 54 : 0 0 0 0 0
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
dscp: outgoing
-------------------------------

0 - 4 : 86673378 0 0 0 0
5 - 9 : 0 0 0 0 0
10 - 14 : 0 0 0 0 0
15 - 19 : 0 0 0 0 0
20 - 24 : 0 0 0 0 0
25 - 29 : 0 0 0 0 0
30 - 34 : 0 0 0 0 0
35 - 39 : 0 0 0 0 0
40 - 44 : 0 0 0 0 0
45 - 49 : 0 0 0 180631 0
50 - 54 : 0 0 0 0 0
55 - 59 : 0 0 0 0 0
60 - 64 : 0 0 0 0
cos: incoming
-------------------------------

0 - 4 : 40043392 0 0 0 0
5 - 7 : 0 0 0
cos: outgoing
-------------------------------

0 - 4 : 86736437 0 0 0 0
5 - 7 : 0 180631 142351
output queues enqueued:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 1764 4949237 323161
queue 2: 0 0 86671612
queue 3: 0 0 0

output queues dropped:
queue: threshold1 threshold2 threshold3
-----------------------------------------------
queue 0: 0 0 0
queue 1: 0 0 0
queue 2: 0 0 144517
queue 3: 0 0 0

Policer: Inprofile: 0 OutofProfile: 0

!mls qos map cos-dscp 0 8 16 24 32 46 48 56
mls qos srr-queue output cos-map queue 1 threshold 3 5
mls qos srr-queue output cos-map queue 2 threshold 1 2 4
mls qos srr-queue output cos-map queue 2 threshold 2 3
mls qos srr-queue output cos-map queue 2 threshold 3 6 7
mls qos srr-queue output cos-map queue 3 threshold 3 0
mls qos srr-queue output cos-map queue 4 threshold 3 1
mls qos srr-queue output dscp-map queue 1 threshold 3 40 46
mls qos srr-queue output dscp-map queue 2 threshold 1 25 32 34 36 38
mls qos srr-queue output dscp-map queue 2 threshold 2 24 26
mls qos srr-queue output dscp-map queue 2 threshold 3 48 56
mls qos srr-queue output dscp-map queue 3 threshold 3 0
mls qos srr-queue output dscp-map queue 4 threshold 1 8

!

GigabitEthernet1/0/48 is up, line protocol is up (connected)
Hardware is Gigabit Ethernet, address is c064.e487.7eb0 (bia c064.e487.7eb0)
MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
input flow-control is off, output flow-control is unsupported
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:23, output 00:00:07, output hang never
Last clearing of "show interface" counters 04:42:06
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 4636
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 365000 bits/sec, 125 packets/sec
5 minute output rate 692000 bits/sec, 208 packets/sec
5704804 packets input, 2444460333 bytes, 0 no buffer

!

I tried to increase the queue set-2 buffer & threshold below example but still i observed the drops. reverted back port config to above. (T.shoot steps taken & reverted back)

mls qos queue-set output 2 threshold 3 3100 3100 100 3200
mls qos queue-set output 2 buffers 20 20 40 20!

!

int x/x

queue-set 2

!

For some ports i changed shape as well like below. but drops not reduced.

srr-queue bandwidth shape 10 0 4 0

Joseph W. Doherty · ‎01-30-2023

Try:

mls qos queue-set output 2 buffers 5 40 40 5 !may need further adjustment
mls qos queue-set output 2 threshold 1 3200 3200 1 3200
mls qos queue-set output 2 threshold 2 3200 3200 1 3200
mls qos queue-set output 2 threshold 3 3200 3200 1 3200
mls qos queue-set output 2 threshold 4 3200 3200 1 3200

Joseph W. Doherty · ‎01-30-2023

Oh, you also might need to add something like:

mls qos queue-set output 1 threshold 1 3200 3200 25 3200
mls qos queue-set output 1 threshold 2 3200 3200 25 3200
mls qos queue-set output 1 threshold 3 3200 3200 25 3200
mls qos queue-set output 1 threshold 4 3200 3200 25 3200

This, and the prior suggested changes, might need further tuning.

BTW, what we're trying to accomplish is, besides raising the logical tail drop limits, is also to reduce interface "reserved" buffers, i.e. let them go into the common pool, and borrow from the enlarged common pool as buffers are needed (this is also why tail drop limits are increased, to take full advantage of the [hopefully] increased common pool buffers).

In my experience, borrowing from an enlarged common pool works very, very well for bursty traffic across your interfaces. The risk, though, is an interface (or interfaces) with sustained oversubscription will drain the common pool, and since we reduced interface "reserved" buffers, other interfaces might start to show drops they don't show now.

However, the case of sustained oversubscription can be addressed, if such ports can be identified. In this case, we can use queue-set 1 for "bursty" ports, and queue-set 2 for sustained oversubscription ports (NB: the suggested changes would need amendment for both queue-sets).

If you make QoS changes, like what I've suggested, monitor all your ports for changes in drop rate.

BTW, years ago I had a case of two 3750G ports dropping multiple packets per second. These two ports were connected to IP SAN servers. Moving all the ports to almost no reserved buffers and increased drop limits while using an enlarged common pool, resulted in no increase in drops for the non SAN ports, but the two SAN ports, drops, decreased to just a few per day! (NB: The 3750G, though, has more physical RAM buffers resources than the 2960 series [I recall 4 MB, per 24 copper ports and/or for the uplink ports], I believe, but QoS architectures, are alike, I also believe.)

MHM Cisco World · ‎01-30-2023

this is cisco recommend config

mls qos queue-set output 1 buffers 15 30 35 20
mls qos queue-set output 1 threshold 1 100 100 100 100
mls qos queue-set output 1 threshold 2 80 90 100 400
mls qos queue-set output 1 threshold 3 100 100 100 3200
mls qos queue-set output 1 threshold 4 60 80 100 400

Joseph W. Doherty · ‎01-30-2023

OP mentions he tried:

mls qos queue-set output 2 threshold 3 3100 3100 100 3200
mls qos queue-set output 2 buffers 20 20 40 20!

As OP's settings provide even more buffers for Q#2/3rd-Q (40% vs. 35%), and use the same overall queue-limit (3200), and describes drops still increased, I wouldn't expect the Cisco recommendation to better mitigate the drop results from what already OP tried. (NB: Of course, I'm not always right, why there was the one time I thought I made a mistake, but I was mistaken - laugh.)

MHM Cisco World · ‎01-30-2023

@Joseph W. Doherty you right
@Abhisheka15
srr-queue bandwidth share 1 30 35 5 <<- add this under interface
srr-queue bandwidth shape 10 0 0 0 <<- remove this
mls qos trust cos <<- since you use Cos-Dscp then you need trust Cos not trust Dscp

Joseph W. Doherty · ‎01-30-2023

Some commentary . . .

"srr-queue bandwidth share 1 30 35 5 <<- add this under interface"

As OP has "srr-queue bandwidth share 1 50 45 5" (and "priority-queue out"), I doubt, as both Q#1 and Q#2 are close to a 1:1 ratio (50:45 [1.11:1] vs. 30:35 [1:1.17]), and those two queues are the ones showing enqueued packets, that this would make much of a difference beyond "moving" where the drops actually occur. (I.e. the fact there's any queuing at all, demonstrates the egress interface is being oversubscribed, at least, occasionally.)

If you do want to "shift" what queue is dropping packets, changing bandwidth ratios, is a good way to do it. If you want to mitigate (i.e. reduce/eliminate) the drops, altogether, possibly buffer tuning is more appropriate.

"srr-queue bandwidth shape 10 0 0 0 <<- remove this"

Personally, I've usually never bothered to configure an individual egress queue shaper, but, in principle, it's a safeguard to preclude Q#0, enabled as PQ, to totally bandwidth starve the other 3 queues. I.e. I see no harm in keeping it, and it can serve an important function.

"mls qos trust cos <<- since you use Cos-Dscp then you need trust Cos not trust Dscp"

As OP's interface config has: "mls qos trust dscp", unclear why this should be changed to trust CoS.

Perhaps you're looking at the CoS incoming/outgoing stats, which show a variety of CoS settings coming from other ports, but all incoming CoS is zero on this port (which likely is due this port being untagged). Even in that case, this switch (I believe) can "map" CoS to/from DSCP, and the converse. As you can also use CoS and DSCP, together, concurrently . . .

In general, given a choice between using CoS vs. DSCP, DSCP is the "better" choice as it offers more granularity (6 bits vs. 3 bits), and as part of an IP packet can travel end-to-end, where CoS is part of tagged VLAN frame, which, of course, is unavailable with untagged frames and needs to be reset at every L3 hop. (BTW, many of Cisco's smart/enhanced L2 switches [like their 2960 series] can work directly with the L3 ToS [i.e. DSCP]. I.e. the only reason I see to use CoS at all, is when you're supporting QoS, at L2, and it's all that's supported.)

Trusting CoS might be "correct" for some of this switch's other ports, but for this port, it appears, trusting DSCP is the correct option.

Joseph W. Doherty · ‎01-30-2023

BTW an (old but) excellent 3750 (applies to 2960 too, I believe) QoS document is: C3750 Switch Family Egress QOS Explained

You may find this document has to be read and re-read, to make sense, but it explains buffer settings I've not seen elsewhere so well explained.

To recap, my approach, for mitigating burst traffic, is to increase queue limits to their logical maximums (i.e. packets will be dropped when logical maximums are exceeded), and then to reduce "reserved" interface buffers providing them to the shared common pool (i.e. packets will also be dropped is there's no physical buffer available).

I also push the two WTD drop thresholds to their maximum values, unless there's really a need to drop some traffic, within the same egress queue, before other traffic.

So, we might start with:

mls qos queue-set output 1 threshold 1 3200 3200 50 3200
mls qos queue-set output 1 threshold 2 3200 3200 50 3200
mls qos queue-set output 1 threshold 3 3200 3200 50 3200
mls qos queue-set output 1 threshold 4 3200 3200 50 3200

I.e. all queue-set 1, queue limits are set to their maximum logical values.

Reducing "reserved" buffer, yielding those free to common pool, is by changing the "50" value above.

You might try after 50 (the default?), first 25, then 15, 10, 5, 1.

Monitor impact of QoS changes!!!

Doing the above, for just queue-set 1 (the default queue-set for all interfaces), might alone be enough to deal well with bursty ports.

As noted in an earlier post, the danger of doing the above, is a port with sustained oversubscription, may deplete/use all the common pool's buffers, which will lead to other ports possibly encountering additional drops due to lack of buffers.

Lastly, often many think QoS can be done with one-size-fits-all. Well, in my experience, just as clothing often comes in multiple sizes (on-the-rack), as one-size-fits-all really doesn't. Further, many of us, for the best fitting clothing, need tailoring.

QoS is often similar, a "one-size-fits-all" policy only works well if your are that "size". Better, is a more-or-less generic QoS policy close to your "size". Best, is a specific QoS policy exactly tailored for your "size".