02-21-2020 12:38 AM
Hi Everyone,
recently I was asked to troubleshoot a networking issue, that I got stumped with. A customer that I work with, got complaints about slow file access, so we began digging. Drilling it down to, the problem seems to be related to QoS. Following Setup:
The first thing we found out was, that the slow access is only measureable on Access-Switches that are connected on 10G Uplinks. So we went a little deeper and set up a testing scenario.
So starting from the exact same base we saw:
This was very confusing, but we continued searching and started suspecting QoS. Next test we disabled QoS on our test-switch and poof, 1G flat, no matter which uplink was used. (But we cannot disable QoS on the infrastructure, as this will lead to other negative effects in the VoIP setup)
So the digging continued, with my limited Know-How on QoS:
"show mls qos int gi1/0/x statistics" as well as "show interface 1/0/x | i drops" and a wireshark setup were my tools. They showed me that outputs were dropped on the output queues (queue 2), on the interface (exact same number of output packet drops as queue drops) and wireshark told me about tcp-retransmissions, out of order, dup-ack and more.
My detective senses told me, that this would make sense, output drops on the interface correlates to reading speed for the client. TCP retransmissions and window-size adjustments, of course we have reduced speed while reading.
My solution for the time beeing:
Okay, if packets in queue x are dropped, I should increase the bandwidths and thresholds for this queue. So I did experiment a little by adjusting:
Or in numbers, for the same read/write process (these were taken on the clients access-interface, uplink did not show drops in any situation):
Original QoS values:
output queues dropped: queue: threshold1 threshold2 threshold3 ----------------------------------------------- queue 0: 0 0 0 queue 1: 0 0 0 queue 2: 0 0 21516 queue 3: 0 0 0
adjusted:
output queues dropped: queue: threshold1 threshold2 threshold3 ----------------------------------------------- queue 0: 0 0 0 queue 1: 0 0 0 queue 2: 0 0 284 queue 3: 0 0 0
So almost a factor 100 less queue drops on our data-queue, which of course resultet in much better reading rates (pretty much flat 1G)
Now I am stumped with a few questions:
mls qos queue-set output 1 buffers 15 25 40 20 to mls qos queue-set output 1 buffers 10 15 65 10 mls qos queue-set output 1 threshold 3 100 100 100 400 to mls qos queue-set output 1 threshold 3 800 800 100 1200
The 10G - 1G behaviour cracks me up the most, but I am happy for any feedback that I can get. Hopefully the information above is relayed in a understandable way, if not I will happily elaborate.
Kind regards,
Jochen
Solved! Go to Solution.
02-21-2020 09:20 AM
02-21-2020 09:20 AM
02-25-2020 10:10 PM
Hi Joseph,
thank you for the additional input and suggestions.
From my understanding of our port-configuration (srr-queue bandwidth share 1 30 35 5) i understood that reservations do not matter (as long as there is only traffic in one queue). As for I read in the documentation, that in shared mode, buffers get shared (duh) amongst the queues. So as long as only one queue is active, full bandwidth should be available.
Your conclusion, that higher bandwidth allows quicker build up on other queues, is the one that I was suggesting to the customer, earlier this week. I asked him to test this by using the 1G uplink (that showed no issue) and forcing 100M on the Access-Port. That way we should have the sam 10:1 scale and maybe can replicate the drops.
One other question:
Do you have any effective ways of measuring/verifiying if other network operations get influenced, by increasing the queue thresholds?
Thanks in any way for the feedback, I will go back to labbing this as soon as possible :)
Kind regards,
Jochen
02-26-2020 09:02 AM
02-27-2020 11:51 PM
Okay, looks like I still got much to learn, about the intricacies of QoS.
For now I'll be content with testing out the theory of 10:1 scaling in buffer behaviour.
Mind sharing some more input, about my question regarding monitoring?
So far I only found the showing of statistics, that displays drops in each queue (but no reasons, or logging messages)
Debugging the QoS features didn't show any results for me.
Thanks for your input so far, have a great day ahead.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide