Solved: ASR920 Output drops - Qos

nani_gvd · ‎10-10-2024

Recently stumbled upon the well known issue with the ASR920's, where output drops occur if traffic ingressed the 10G interfaces, but has to egress a 1G interface, described here:
https://www.cisco.com/c/en/us/support/docs/routers/asr-920-router/218310-how-to-handle-microbursts-in-asr-920-di.html

fixed it with

  queue-limit percent 100

question now, imaging having multiple 1G egressing interfaces, also suffering output drops, is it safe to apply the queue-limit percent 100 to each interface's service instance?

Joseph W. Doherty · ‎10-10-2024

"question now, imaging having multiple 1G egressing interfaces, also suffering output drops, is it safe to apply the queue-limit percent 100 to each interface's service instance?"

Yes and no, which is why you'll find the following in your reference:

The above may negate your:

BTW, I initially almost laughed when I read:

@nani_gvd wrote:

Recently stumbled upon the well known issue with the ASR920's, where output drops occur if traffic ingressed the 10G interfaces, but has to egress a 1G interface, described here:

because any time ingress rate is faster than egress rate, drops are possible.

Actually, I found the reference very interesting, because I was unaware that the ASR 920 manages its egress buffering much like many lower end Catalyst switches do, i.e. their interfaces are limited, by default, to how much buffer space they can acquire from a common pool.

The purpose of this approach, is to insure one interface doesn't consume all the buffer space so that other egress interfaces are not starved for buffers.

For example, consider you have one 10g ingress interface providing data to 5 gig egress interfaces. Further suppose, 4 of the 5 egress interfaces total traffic, from the 10g interface is each averaging a gig or less, but all the remaining 10g traffic is directed to just one gig egress interface. If that single gig interface can obtain all buffer space, it's possible the other 4 gig interfaces will drop traffic for lack of any buffers (in addition to the gig interface that will too eventually drop traffic too if it needs another buffer when all are used).

So, actually, just as the reference cautions, applying 100% allocation to any single interface possible can disrupt all the other gig interfaces.

That said, if you're really dealing with microbursts, where a larger allocation is just temporality needed, as long all buffers are not consumed, you can allocate 100%, but I would suggest, not pushing in to that extent (100%).

If you want to know the ideal percentage allocation, that really depends on your traffic mix.

Ideally, you don't want to limit an egress interface not to have buffers it needs, when they are available, but conversely, you don't want egress interface(s) that are within their bandwidth capacity to be impacted by egress interface(s) that are oversubscribed.

Possibly there's some mathematical to compute "ideal" allocations, but in my experience, I make changes and do intensive monitoring of the impact for the first 24 (normal or known high usage business day) hours.

With any QoS, I believe on-going monitoring is needed, to confirm it continues to meet its service level goals. (Basically, not much different from ordinary bandwidth and/or drop monitoring, just more detailed correlations against service levels.)

If the above doesn't make sense, please post follow up replies.

View solution in original post

Joseph W. Doherty · ‎10-10-2024