Solved: Re: To QoS or not to QoS

mallywally · ‎01-21-2013

Hello, I hope that somebody might be able to help me with a question I have regarding QoS on Catalyst switches

(Cat 6509 core switches and Cat 3560 access switches).

My company has rolled out VMware virtual desktops to the majority of our users and as part of an ongoing piece of work to try to improve the user experience on the virtual desktops, we have engaged an external consultant to assess our entire environment to ensure no obvious issues. One of the recommendation which the consultant has made is for us to implement Auto-QoS on our entire LAN and to then to mark the VDI protocols (PCoIP and RDP) going from the ESX Servers TOWARDS the Thin Client (users') terminals as DSCP AF21 whilst leaving all other traffic "as is" (his recommendation was not to bother marking the traffic on the way back from Thin Clients to the ESX Servers) .

The consultant expalined that his reasoning is as follows:

VDI traffic will be placed into higher priority queues than non-VDI traffic on our network devices. This he said, will then ensure a smoother flow of the VDI packets as any short spikes in traffic will then be "smoothed" off as well as VDI packets being processed more quickly due to the higher priority queue.

I have taken a look at the traffic paths through our network and can see no points on our LAN where traffic is congesting. I am measuring traffic going through all interfaces at 1 second intervals using SNMP, and cannot see any utilisation on the links going above 30% on any of them.

The question I have is this - if interfaces are not experiencing congestion, will QoS still provide benefit? I always assumed that if the amount of traffic coming in or going out of an interface is less than what it can handle, then it's buffers will never start to fill, and therefore the QoS queues would never get used - is this incorrect??

I suppose another way of looking at it is this - without QoS when an interface never has to process more than it's stated link speed, does it still buffer the packets for a short time before sending them on, or will the packets get passed through at "line rate" speed?

To give a basic idea of the network topology, the ESX servers connect directly to the Cat 6509's which then connect directly to the Cat 3560. Each access switch has up to 48 thin client terminals,each connected to an individual FastEthernet port. To reiterate, none of the links along the way from server to thin client get very heavily used:

ESX -> 6509, max I have seen is about a 280Mb/s spike

6509 -> 3650, max I have seen is about a 180Mb/s spike

3560 -> Thin Client, max I have seen is about a 15Mb/s spike

Any information, or even just general advice regarding this would be greatly appreciated.

Thanks

Malcolm

Joseph W. Doherty · ‎01-21-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Insufficient information to say whether your network actually requires QoS, but as VDI is delay sensitive, QoS is often worth doing for assurance of specific traffic type treatment. (Basically similar situation running VoIP across a LAN; i.e. in any specific instance you might not actually need QoS, but it assures specific traffic treatment.)

That said, I think VDI actually recommends close to VoIP like treatment for PCoIP; i.e. AF21 seems a bit low. Also, although unsure why consultant recommends packets to server not marked for preferential treatment. Would expect lower bandwidth demand, but as from client to server can be stuff like sending mouse movements, you want to avoid additional network latency for that traffic too.

BTW, yes you're correct, if there's no congestion on an interface, QoS never engages. However, congestion is possible anytime there's oversubscription. Even then, the real question is it significant enough to be adverse? Insufficient information in your post to say, but much can happen during 1 second (i.e. even 1 second utilization doesn't always reveal the true situation).

View solution in original post

Joseph W. Doherty · ‎01-22-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

One of the things I am struggling with is that I have seen numerous opinions stating that although it is considered beneficial in general, if there is no congestion in a network, then there is no definite need for QoS.

Yes, that's correct, if there's no/never congestion, QoS isn't needed. However, most network designs don't preclude congestion. The real variables are how often and how severe the congestion is. Basically, when it does happen, is it adverse to the applications running over the network? Different applications have different sensitivities; mixing application traffic is when the need for QoS is usually recognized (well sometimes recognized, often there's the "we need bandwidth" approach too). For example, usually no one worries about QoS if all your traffic was VoIP or FTP, but mix the two and watch VoIP suffer.

To your knowledge, is there any way I can try to measure for congestion more frequently than 1 second, or is there some other manner in which this kind of information can be ascertained?

Well, for starters you do want to avoid "We've found why our device CPUs are running high and link bandwidth utilization is high, it's from frequent SNMP polling."

In general, one of the first things I look for are if any drops are happening. This not only tells me there's congestion, but it's occasionally bad enough to overflow queue/buffer resources.

For polling, if can find an OID that provide the current queue depth, that's worth tracking. Much, much more useful than the bandwidth utilization. (Why is queue depth more useful? Well given the series of number [5,5,5,5], [0,10,10,0] and [1,9,6,4]; bandwidth utilization is like the average, which is 5 for all 3 sets, but that tells me little about the distribution in each set.)

Later IOSs support scripting, where they can "poll" locally. This avoids the polling traffic across the same wire.

I don't believe it was ever implemented on any Catalyst switches, but an interesting IOS feature was Corvil Bandwidth technology that was intended to better answer questions about bandwidth requirements. Just reading about what this technology did, might help you in some of you understanding for QoS. (BTW, it's 3rd party technology, so don't limit yourself to just Cisco docs.) You might start here: http://www.cisco.com/en/US/docs/ios/12_3t/12_3t14/feature/guide/gtcbandw.html

View solution in original post

Joseph W. Doherty · ‎01-21-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

Insufficient information to say whether your network actually requires QoS, but as VDI is delay sensitive, QoS is often worth doing for assurance of specific traffic type treatment. (Basically similar situation running VoIP across a LAN; i.e. in any specific instance you might not actually need QoS, but it assures specific traffic treatment.)

That said, I think VDI actually recommends close to VoIP like treatment for PCoIP; i.e. AF21 seems a bit low. Also, although unsure why consultant recommends packets to server not marked for preferential treatment. Would expect lower bandwidth demand, but as from client to server can be stuff like sending mouse movements, you want to avoid additional network latency for that traffic too.

BTW, yes you're correct, if there's no congestion on an interface, QoS never engages. However, congestion is possible anytime there's oversubscription. Even then, the real question is it significant enough to be adverse? Insufficient information in your post to say, but much can happen during 1 second (i.e. even 1 second utilization doesn't always reveal the true situation).

mallywally · ‎01-22-2013

Thanks for your response Joseph – it is very much appreciated.

I understand that in order to be able to give definitive recommendations regarding QoS on a network one would probably require a lot more information, so please rest assured that I certainly don’t expect to achieve that via this forum. The many questions I have surrounding QoS and why/when it is required are slowly being answered through reading forums such as this along with other material, and your posting has certainly gone a long way in helping me with some of my questions.

Regarding the classification of VDI traffic, I have read a Teradici document which recommends AF41 or AF31 for PCoIP, so that pretty much ties in with what you have said.

FYI, a link to doc can be found here: http://communities.vmware.com/docs/DOC-16186

I agree with you on the "1 way" QoS point - it's one of the reasons why I feel I need to know a lot more about this all before I agree to impliment.

On your last point again I agree that a 1 second interval is a very long time when we consider things like latency being measured in milliseconds, or stuff like (from what I have read) 6500 QoS policing / token bucket operation with the token removal interval being 0.00025 seconds.

One of the things I am struggling with is that I have seen numerous opinions stating that although it is considered beneficial in general, if there is no congestion in a network, then there is no definite need for QoS.

To your knowledge, is there any way I can try to measure for congestion more frequently than 1 second, or is there some other manner in which this kind of information can be ascertained?

Alternatively, do you know of any documentation on how I could try to establish whether congestion is occurring in any way on my network, either on the links themselves, or in hardware on the devices?

Once again, thanks for your time and help.

Joseph W. Doherty · ‎01-22-2013

Disclaimer

The Author of this posting offers the information contained within this posting without consideration and with the reader's understanding that there's no implied or expressed suitability or fitness for any purpose. Information provided is for informational purposes only and should not be construed as rendering professional advice of any kind. Usage of this posting's information is solely at reader's own risk.

Liability Disclaimer

In no event shall Author be liable for any damages whatsoever (including, without limitation, damages for loss of use, data or profit) arising out of the use or inability to use the posting's information even if Author has been advised of the possibility of such damage.

Posting

One of the things I am struggling with is that I have seen numerous opinions stating that although it is considered beneficial in general, if there is no congestion in a network, then there is no definite need for QoS.

Yes, that's correct, if there's no/never congestion, QoS isn't needed. However, most network designs don't preclude congestion. The real variables are how often and how severe the congestion is. Basically, when it does happen, is it adverse to the applications running over the network? Different applications have different sensitivities; mixing application traffic is when the need for QoS is usually recognized (well sometimes recognized, often there's the "we need bandwidth" approach too). For example, usually no one worries about QoS if all your traffic was VoIP or FTP, but mix the two and watch VoIP suffer.

To your knowledge, is there any way I can try to measure for congestion more frequently than 1 second, or is there some other manner in which this kind of information can be ascertained?

Well, for starters you do want to avoid "We've found why our device CPUs are running high and link bandwidth utilization is high, it's from frequent SNMP polling."

In general, one of the first things I look for are if any drops are happening. This not only tells me there's congestion, but it's occasionally bad enough to overflow queue/buffer resources.

For polling, if can find an OID that provide the current queue depth, that's worth tracking. Much, much more useful than the bandwidth utilization. (Why is queue depth more useful? Well given the series of number [5,5,5,5], [0,10,10,0] and [1,9,6,4]; bandwidth utilization is like the average, which is 5 for all 3 sets, but that tells me little about the distribution in each set.)

Later IOSs support scripting, where they can "poll" locally. This avoids the polling traffic across the same wire.

I don't believe it was ever implemented on any Catalyst switches, but an interesting IOS feature was Corvil Bandwidth technology that was intended to better answer questions about bandwidth requirements. Just reading about what this technology did, might help you in some of you understanding for QoS. (BTW, it's 3rd party technology, so don't limit yourself to just Cisco docs.) You might start here: http://www.cisco.com/en/US/docs/ios/12_3t/12_3t14/feature/guide/gtcbandw.html

mallywally · ‎01-23-2013

Thanks very much Joseph, your replies have helped me greatly and also given me a valuable steer regarding several of the questions I have.

Point taken with regards to making sure that we don’t become too blinkered in our approach and making sure we factor in all aspects when investigating things. Just so you know, I don’t have the 1 second SNMP polling running full time – I’ve only ever done that 1 interface at a time and only when trying to get a more granular picture for a set period of time.

My feeling is that we probably will end up implementing QoS on our entire network at some stage, but I think for now I will be pushing back on the “just mark VDI from the Servers at AF21 and run Auto-QoS…that’s all you need to do” recommendation. I feel that we definitely need to have a deeper understanding on how things will work, we need to be able to test properly, and we also need to be able to understand how we can monitor and tune things when QoS has been rolled out.

Many thanks