Solved: With QOS, what constitutes congestion -- buffers or bandwidth?

brettp · ‎03-15-2018

I realize this question seems to have been asked a number of times, but I haven't found a satisfying answer or one that I understand. I've read so much, but I'm still very confused. My question is, what constitutes congestion on a link that causes QOS to come into play? Is it link utilization (ie. exhausted bandwidth... 100mb link with 100mb traffic going through it) or is it buffer exhaustion (as in when 1 or more buffers are filled to it's maximum value.) For instance... if 10% of buffer is allocated for queue 1, it's max is 200... if it's using 200% of the buffer (because the other buffers aren't filled, thus the common pool has some buffer space to offer) ... would QOS start dropping those packets, even though actual bandwidth may not be saturated? Thank you for your input!

Joseph W. Doherty · ‎03-21-2018

Egress architectures vary, but there's often a hardware interface FIFO transmission queue. When such "overflows", packets may then be queued in QoS managed queues.

As to queues filling, and overflowing, that's due to two causes, that are not mutually exclusive. First, you may have an ingress interface that has higher bandwidth than an egress interface. For example, 10g ingress and gig egress.

Second, you may have multiple ingress interfaces whose aggregate bandwidth exceeds the egress bandwidth. For example, 11 gig ingress interfaces to one 10gig egress.

The two foregoing can create sustained congestion, but also consider if you had 9 gig ingress interfaces to one 10gig egress. What if packets arrived from all 9 ingress interfaces at exactly the same time. The 10g interface can only transmit one packet at a time, the other 8 will need to be queued.

Lastly, when you dealing with bandwidth usage, what does 30% usage really mean? It means some percentage of usage over time interval, but it doesn't directly indicate how the bandwidth was consumed. Returning to my last example, if those 9 gig interfaces packets all nicely arrived 1 (gig) packet time apart, there would be no queuing and your utilization would be 90%. Conversely, though, if all nine packets always arrived at exactly the same time, and you could only queue 5, you would drop 4, and your utilization would show 50% usage with a 4/9% drop rate. The latter would have you wondering, how can such stats exist as the link is only half used but its dropping almost half the packets?

View solution in original post

brettp · ‎03-16-2018

Anyone have any insight? The reason I ask is because I have 4 1 gig interfaces bundled into an etherchannel, obviously with a bw of 4 gig. Each interface / the switch is using the default auto qos voip trust settings. I see drops on the interfaces but there is no way all 4 gigs are getting used up (PRTG shows no high utilization, but I honestly, I don't know if would even report bursts of just a few milliseconds... but again, I can't see a burst getting up to 4 gig.) This leads me to believe, congestion means when queues reach their maximum not when the link bandwidth is used up (which confuses me because I thought it was the other way around. I wouldn't think there would be any queuing if there is available bandwidth. Any insight is appreciated. Thanks!

Joseph W. Doherty · ‎03-16-2018

QoS generally considers there's "congestion" when the queue(s) it manages have one or more packets.

Drops happen when a queue limit is busted, either logically (a threshold) or physically (it cannot allocate another element).

More generally, you have "congestion" whenever a packet needs to be queued, even just one. However, congestion isn't always or often adverse to applications running across a network.

brettp · ‎03-19-2018

Thank you for the reply! This leads me to a second question... we've all read "QOS kicks in when there is congestion" -- In my mind, this creates a paradox. QOS kicks in when congestion occurs, but congestion occurs when QOS is queuing frames/packets in its various configured queues? When something hits the the egress interface without bandwidth being capped, is it still placed in the queue it's marking matches? I thought it essentially just passes through without hitting a queue -- but it does indeed get queued into the proper queue (although it doesn't wait long?) If that's the case... I think that clears up some of my confusion.

Joseph W. Doherty · ‎03-19-2018

". . . but congestion occurs when QOS is queuing frames/packets in its various configured queues?"

Although QoS might cause queuing (e.g. using a shaper), congestion, and queues, usually happen without "QoS".

QoS often encompasses some form of active queue management.

"When something hits the the egress interface without bandwidth being capped, is it still placed in the queue it's marking matches? I thought it essentially just passes through without hitting a queue?"

Generally, for your questions, such packets are not placed in a QoS managed queue although they may be subject to other class matched QoS functions, such as marking and/or policing.

brettp · ‎03-21-2018

Thank you again for the reply… please bear with me, I think my brain just can not comprehend this but I truly appreciate you taking the time to help me out! “Generally, for your questions, such packets are not placed in a QoS managed queue although they may be subject to other class matched QoS functions, such as marking and/or policing.” Taking all QoS functions out of play except for queuing, from what I gather based on what you said, if bandwidth is available traffic passes through the egress interface without being placed into a any queue. So at what point is traffic placed into a QoS managed queue? I would imagine, that is the point of “congestion” – I would imagine that is when bandwidth is exhausted (because now all traffic can not pass through, so some of it needs to get queued up.) All of these questions stem from the fact that I’m confused as to why QoS would detect congestion and drop packets on a 4GB uplink if there’s no way for 4GB to be exhausted on one of my production switches.

Joseph W. Doherty · ‎03-21-2018

Egress architectures vary, but there's often a hardware interface FIFO transmission queue. When such "overflows", packets may then be queued in QoS managed queues.

As to queues filling, and overflowing, that's due to two causes, that are not mutually exclusive. First, you may have an ingress interface that has higher bandwidth than an egress interface. For example, 10g ingress and gig egress.

Second, you may have multiple ingress interfaces whose aggregate bandwidth exceeds the egress bandwidth. For example, 11 gig ingress interfaces to one 10gig egress.

The two foregoing can create sustained congestion, but also consider if you had 9 gig ingress interfaces to one 10gig egress. What if packets arrived from all 9 ingress interfaces at exactly the same time. The 10g interface can only transmit one packet at a time, the other 8 will need to be queued.

Lastly, when you dealing with bandwidth usage, what does 30% usage really mean? It means some percentage of usage over time interval, but it doesn't directly indicate how the bandwidth was consumed. Returning to my last example, if those 9 gig interfaces packets all nicely arrived 1 (gig) packet time apart, there would be no queuing and your utilization would be 90%. Conversely, though, if all nine packets always arrived at exactly the same time, and you could only queue 5, you would drop 4, and your utilization would show 50% usage with a 4/9% drop rate. The latter would have you wondering, how can such stats exist as the link is only half used but its dropping almost half the packets?

brettp · ‎03-23-2018

Thank you for the explanation... You've mentioned some scenarios that I would have never thought of. I think all of your responses have given me a better understanding of what might be going on. Thanks again!