Cat 9300 and QOS QUEUE-SOFTMAX-MULTIPLIER

billburns · ‎07-12-2023

I am slowly decreasing my "Total Output drops" on the interfaces on my stacked Cat 9300s with the global command "qos queue-softmax-multiplier xxx" I started out with 300 and have been jump up by 100 and I am at currently 900. There is only a few interfaces that occasionally record drops now. What are the dangers of issuing the max of1200? I understand the buffers will borrow from the global memory pool when there is a burst and was wondering if anyone has experience running the max 1200 in a prod environment?

Joseph W. Doherty · ‎07-12-2023

"What are the dangers of issuing the max of1200?"

As you already seem to understand, one port, doing extensive queuing, could deplete the shared memory pool causing other (possibly all other) ports to drop packets (as they would be unable to acquire any additional buffers).

I don't have fist hand experience on the Catalyst 9Ks, but going back to the 3750s, I haven't seen maxing out the shared buffers cause any issues. Also suggesting going to max, to others on these community forums, for the 3650/3850 and 9K series, others have only reported drop reduction (like you've seen with your increases, already) without any new problems.

One factoid mentioned in the 9K QoS white paper, mentions only defining the minimum number of queues being used per port, will increase shared buffers.

●      Use the command qos queue-softmax-multiplier <100-1200>. This command was discussed in the DTS section. To increase the PBC’s ability to absorb micro-bursts, use a value close to 1200.

●      Reduce the total number of queues per port. For example, use three queues instead of eight. Every queue has dedicated buffer from the shared pool, which reduces the total size of the global pool. With fewer queues, the shared pool will have a larger size to share.

(Above found below "Figure 33".)

billburns · ‎07-24-2023

Thanks Joseph for the information. I inched my way up to max of "qos queue-softmax-multiplier 1200" on a production 5 switch stack of 9300s running 17.6.5 and have not experienced any issue with performance or lack of free memory. While the software setting has helped on the amount of drops, I still see drops on a few interfaces when there is a burst but the overall count is down.

MHM Cisco World · ‎07-24-2023

go ahead use multiplier 1200,
c9K support this value.

the buffer brow from global but not effect it hardcoded buffer allocation, i.e. it brow only free not allocated

Joseph W. Doherty · ‎07-24-2023

". . . I still see drops on a few interfaces when there is a burst but the overall count is down."

Glad to hear that. Hopefully, drop counts are way down.

Lack of problems, due to making this change, appear to be the norm.

Regardless, though, it may also be possible on a 9K, doing other tuning, to provide even more buffers for queues under (hopefully only) microburst stress.

The 9K White Paper, I referenced in my prior post, describes these (some of these are only available with specific IOS versions and/or specific 9K models).

Some of the techniques, described, I used to do similar on the 3750 series, with good results. However, you do need to take even more care to make incremental changes and monitor the actual results.

Something I rarely describe, for decades, we've been somewhat mislead by lower drop counts then might actually be expected whenever there's any bandwidth oversubscription. This because most traffic has traditionally used TCP and many TCP stacks' RWIN allocations were not large enough to really over drive actual available bandwidth. (If your curious, Internet search BDP [bandwidth delay product] and then look into the usual default size for a TCP stack's RWIN. Many years ago, I noticed the "classical" Cisco FIFO queue size of 40, appeared to be about "just right" for supporting a typical Ethernet 10 Mbps LAN or a typical 1.5 Mbps WAN [BDP]. Coincidental? I've wondered.)