07-29-2021 08:46 AM
I am tagging all user ingress traffic at their ports via input policy-maps and queueing egress traffic to trunks and user ports with an output policy. I'm seeing a bunch of dropped traffic for certain types of tagged traffic even when there is bandwidth still available the port.
My output policy-map is this (on our 3650s):
policy-map OUR-QOS-POLICY-OUTPUT class OUR-QOS-OUTPUT-PRIORITY-QUEUE-1 priority level 1 police rate percent 15 class OUR-QOS-OUTPUT-PRIORITY-QUEUE-2 priority level 2 police rate percent 5 class OUR-QOS-OUTPUT-QUEUE-6 !Network/Signaling/AD bandwidth remaining percent 25 queue-buffers ratio 10 class OUR-QOS-OUTPUT-QUEUE-5 !Teams/WebEX/Jabber bandwidth remaining percent 5 queue-buffers ratio 10 queue-limit dscp af43 percent 80 queue-limit dscp af42 percent 90 queue-limit dscp af41 percent 100 class OUR-QOS-OUTPUT-QUEUE-4 !Mission Critical/SQL/Finance Apps bandwidth remaining percent 10 queue-buffers ratio 10 queue-limit dscp af33 percent 80 queue-limit dscp af32 percent 90 queue-limit dscp af31 percent 100 class OUR-QOS-OUTPUT-QUEUE-3 !Net Management bandwidth remaining percent 10 queue-buffers ratio 10 queue-limit dscp af23 percent 80 queue-limit dscp af22 percent 90 queue-limit dscp af21 percent 100 class OUR-QOS-OUTPUT-QUEUE-2 !Bulk/Scavenger bandwidth remaining percent 5 queue-buffers ratio 10 queue-limit dscp values af13 cs1 percent 80 queue-limit dscp values af12 percent 90 queue-limit dscp values af11 percent 100 class class-default bandwidth remaining percent 25 queue-buffers ratio 25
I'm seeing drops in OUR-QOS-OUTPUT-QUEUE-2 for AF13 traffic on a 1Gb user port when the volume of that type gets high (but not 800Mb high), and I see drops on OUR-QOS-OUTPUT-QUEUE-6 when the volume is nowhere near 1Gb.
My question is why if the bandwidth is there, are those queues dropping traffic? Does the queue buffer ratio not adhere to the same logic as the bandwidth, in that if the other queues are empty to use their space for the needed queue?
I have tried it with both "qos queue-softmax-multiplier 1200" on and off, same thing, dropped traffic.
The other question is about why if I tag ingress traffic on a 4400 router via an input policy-map that my Netflow exports from that same router show the original TOS tag of 0 (or whatever it may have been before) not the one the policy-map sets? I can only assume that I'm stuck with it working that way because it's likely a result of the order in which the router processes the policy-maps and netflow (hopefully not by oversite but for performance reasons). Is there something I can set that will make Netflow show the traffic processed by the policy-map?
I confirmed on a device downrange from the router that the traffic does in fact have the correct DSCP tags, but my Netflow analyzer still shows the majority of that traffic as CS0.
Any Ideas?
07-29-2021 11:11 AM - edited 07-29-2021 11:11 AM
"My question is why if the bandwidth is there, are those queues dropping traffic?"
Because, likely, the bandwidth is NOT there.
Drops happen down in the millisecond time scale, bandwidth usage is often an average over lots of seconds to multiple minutes. Search the Internet on the subject of "micro bursts".
"Is there something I can set that will make Netflow show the traffic processed by the policy-map?"
Possibly. Netflow, I recall, works with the ingress side, which may account for the lack of markings, you note. Later Netflow implementations, I also recall, support working with egress, but you need to enable that on the egress interface. Don't recall what the Netflow interface egress configuration command is, but check you IOS manuals for your version.
08-02-2021 12:28 PM
You would think if it was a bandwidth issue that it would show in other queues, it looks like it shows up primarily in the queue that is set the smallest, that is why it seems like it's not able to access the buffers of the queues. Is it possible that the queue-limit goes into effect before it tries to use the other queues available space? Something like once the queue reaches 90% it uses available space elsewhere, but by then it has already starting dropping that above the lowest percentage which is 80%?
Service-policy output: OUR-QOS-POLICY-OUTPUT queue stats for all priority classes: Queueing priority level 1 (total drops) 0 (bytes output) 737146819 queue stats for all priority classes: Queueing priority level 2 (total drops) 0 (bytes output) 3583348 Class-map: OUR-QOS-OUTPUT-PRIORITY-QUEUE-1 (match-any) 2433842 packets Match: dscp ef (46) Priority: Strict, Priority Level: 1 police: rate 15 % rate 150000000 bps, burst 4687500 bytes conformed 430662602 bytes; actions: transmit exceeded 0 bytes; actions: drop conformed 0000 bps, exceeded 0000 bps Class-map: OUR-QOS-OUTPUT-PRIORITY-QUEUE-2 (match-any) 44552 packets Match: dscp cs4 (32) cs5 (40) Priority: Strict, Priority Level: 2 police: rate 5 % rate 50000000 bps, burst 1562500 bytes conformed 3583356 bytes; actions: transmit exceeded 0 bytes; actions: drop conformed 0000 bps, exceeded 0000 bps Class-map: OUR-QOS-OUTPUT-QUEUE-6 (match-any) 5314615 packets Match: dscp cs2 (16) cs3 (24) cs6 (48) cs7 (56) Queueing (total drops) 0 (bytes output) 1273500537 bandwidth remaining 25% queue-buffers ratio 10 Class-map: OUR-QOS-OUTPUT-QUEUE-5 (match-any) 3156776 packets Match: dscp af41 (34) af42 (36) af43 (38) Queueing queue-limit dscp 34 percent 100 queue-limit dscp 36 percent 90 queue-limit dscp 38 percent 80 (total drops) 0 (bytes output) 1530405029 bandwidth remaining 5% queue-buffers ratio 10 Class-map: OUR-QOS-OUTPUT-QUEUE-4 (match-any) 296755 packets Match: dscp af31 (26) af32 (28) af33 (30) Queueing queue-limit dscp 26 percent 100 queue-limit dscp 28 percent 90 queue-limit dscp 30 percent 80 (total drops) 0 (bytes output) 162607860 bandwidth remaining 10% queue-buffers ratio 10 Class-map: OUR-QOS-OUTPUT-QUEUE-3 (match-any) 18696983 packets Match: dscp af21 (18) af22 (20) af23 (22) Queueing queue-limit dscp 18 percent 100 queue-limit dscp 20 percent 90 queue-limit dscp 22 percent 80 (total drops) 0 (bytes output) 5279799586 bandwidth remaining 10% queue-buffers ratio 10 Class-map: OUR-QOS-OUTPUT-QUEUE-2 (match-any) 82035493 packets Match: dscp cs1 (8) af11 (10) af12 (12) af13 (14) Queueing queue-limit dscp 10 percent 100 queue-limit dscp 12 percent 90 queue-limit dscp 14 percent 80 (total drops) 13995670 (bytes output) 70412835892 bandwidth remaining 5% queue-buffers ratio 10 Class-map: class-default (match-any) 101790019 packets Match: any Queueing (total drops) 0 (bytes output) 20328511602 bandwidth remaining 25% queue-buffers ratio 25
It's only dropping 1 in 5000 packets in that queue, but none of the other queues are dropping anything.
Any ideas? How would I be able to see if it is in fact bursty traffic?
Is there any documentation on where the queue-limits kick in when other queue space might be available?
Can queues share unused ratio space from other queues?
If it is the queue-limit percentage kicking in too early should I try giving this one queue a higher ratio? Will one of the queues I take it from be able to use it again if needed and the busy queue doesn't happen to be busy at that time?
08-02-2021 05:10 PM - edited 08-02-2021 05:11 PM
"You would think if it was a bandwidth issue that it would show in other queues . . ."
Well actually often not because one point of QoS is you prioritize some traffic over other traffic. Basically, when you run out of bandwidth traffic will queue, it's then a matter of what gets dropped first which depends on the traffic and your QoS configuration.
". . . it seems like it's not able to access the buffers of the queues."
That may be the case. Unsure about the 3650/3850 series, as I've never used them, but the prior 3560/3750 series "reserved" buffers for interface and/or different interface queues. (NB: These reservations are often why on the 3560/3750 series, overall drops would increase when QoS enabled [with default] vs. disabled QoS.)
"How would I be able to see if it is in fact bursty traffic?"
Generally, you need something like a packet sniffer to actually see bursts.
"Is there any documentation on where the queue-limits kick in when other queue space might be available?"
Might not be exactly what you're looking for, but have you seen? https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html
"If it is the queue-limit percentage kicking in too early should I try giving this one queue a higher ratio? Will one of the queues I take it from be able to use it again if needed and the busy queue doesn't happen to be busy at that time?"
The prior troubleshooting document might help you with these questions. BTW, your 3650 is more-or-less, a "less powerful" 3850, but tech notes, and such, are often published for the "premier" platform of like/similiar platforms. (Same used to be true with 3560 and 3750s. For info on the 3560, research on the 3750.)
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide