QoS System Class and QoS Policies specific to Microsoft Hyper V 2016

MSL · ‎02-19-2018

Our environment consists of Microsoft Hyper V 2016 clustered nodes. 6248 UP FIs are connected to N9K switches (4 ports X 10 Gb), 2 fc ports are connected to MDS 9148 S switches which are connected to Dell Compellent and EMC VNX SAN storage. Each B200 M4 server is equipped with VIC 1340 card. The ports on N9K which is connected to FIs are configured with MTU 9124.

We configured the QoS System class as below

Gold with CoS 1, Weight 2 (20%) and MTU 9216 --for LM (Live Migration) & CSV (Cluster Shared Volume) traffic

BE with Weight 5 (50%) -- for HV Management NIC and VM traffic

FC with CoS 3 with Weight 3 (30%) - For FC traffic.

=> Here our first concern is, since FIs are connected to MDS and MDS connected to SAN storage using FC, how teh above FC QoS System class come in to picture? Our understanding is this is for FCoE traffic which is between the nodes, the traffic between the node and SAN storage is real FC using the virtual HBA card. Do we get a clarification on this point.

=> Another one, how the weightage that we configured with QoS System Class can be analysed to meet our requirement? All the HV hosts are booting from SAN and all VMs are in SAN, we configured the weight based on our understading that teh VIC 1340 has 10 Gbps to each IOM which is connected to each FIs and this 10 Gbps will be devided in to 3 (2 Gbps for Gold, 5 Gbps for BE and 3 Gbps for FCoE traffic). Is this the right configuration?

3 QoS Policies are configured as mentioned below.

1) Policy for LM & CSV

Priority: Gold

Burst: 10240 (the NIC associated will be 10 Gbps)

Rate: line-rate

2) Policy for HV Management NIC

Priority: Best Effort

Burst: 1024 (the NIC associated will be 1 Gbps)

Rate: 2000000 - To limit the NIC to use max 2 Gbps

3) Policy for VM traffic

Priority: Best Effort

Burst: 10240 (the NIC associated will be 10 Gbps)

Rate: line-rate

With the Management NIC Template we mentioned 2nd QoS policy (Policy for HV Management NIC) to limit the speed with MTU 1500. With the host we can see the NIC with 1 Gbps.

With LM & CSV NIC Templates we mentioned 1st QoS policy (Policy for LM & CSV) with MTU 9000. With the host we can see the NIC with 10 Gbps.

With VM NIC Template we mentioned 3rd QoS policy (Policy for VM traffic) with MTU 1500. With the host we can see the NIC with 10 Gbps, on this NOIC we created the HV switch and all VMs use NIC from this virtual HV switch.

=> Do we get a confirmation on what we did is correct and it's based on best practice.

Thanks in advance

MSL · ‎02-19-2018

Pls comment / put your suggestions

Evan Mickel · ‎02-20-2018

I will start by saying this is really not the right place to collect validation for a design, the question that you're asking is an end-to-end consultative question with multiple layers. With that being said I want to do my best to help at least narrow this down. Perhaps we can check a few items off and boil this down further for you. The most important point that I can make here is that with respect to QoS there is no way to give you a blanket statement answer here, there is no "right" or "wrong" configuration based on just what you've deployed, it has more to do with what traffic is actually being sent.

If you are generating an extreme amount of VM traffic for whatever reason, the percentage allocated may not be adequate. Simply stating that you have X allocated for Y effectively means nothing until you have an idea of what traffic your host is actually generating. If you spun up a full host of DB servers scheduled to replicate every 5 minutes, you're going to run into problems, if you host a single Exchange server on this host, you're probably overthinking the design and don't need to split hairs at the QoS level. Just making a point that this could vary wildly in either direction.

To answer your direct questions:

1) Traffic to the SAN is standard FC. No Qos Policy would come into effect.

2) VIC 1340 Data Sheet posted below. This should be a big help in answering your questions there.

VIC 1340 Data Sheet:

https://www.cisco.com/c/en/us/products/collateral/interfaces-modules/ucs-virtual-interface-card-1340/datasheet-c78-732517.html

In summary, what you're doing sounds generally fine, though it is difficult to say for sure. The best way to tailor a QoS policy is to truly understand what traffic is being generated by the hosts, understanding if and when bursts occur and to what degree, and applying a configuration that accommodates all of this. If you can, come back with specific follow-up questions and I will do my best to sort out answers for you. I hope we're at least closer here and that this serves as a reasonable jumping-off point.

Thanks!

MSL · ‎02-21-2018

Thanks a lot Evan. We worked only with Dell Blades and now migrating to UCS and it requires sometime to get familiar with it.

Our VMs are not that much highly utilized / loaded (all the highly utilized VMs are already moved out to cloud). We have Exchange servers, SQL and other application servers. As of now all of them are running with 1 Gb NICs on Dell Blades and all looks fine.

One of the concern / confusion here is when we use the QoS System Class means, it divides each 10 Gb connection (VIC 1340 has 10 Gb to each 2204 XP IOMs - Pg no. 65 - https://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-servers/b200m4-specsheet.pdf) to each IOM / FI with the weight that we mentioned, is it correct? Keeping this in mind we configured the traffic for LM / CSV and for VMs to FI A for odd numbered blades and FI B for even numbered blades and configured Fail Over at UCS level (instead of configuring teaming or load balancing at OS / server level). Hope this design is best

Other one is how can we monitor the performance / bottleneck at UCS level?

What exactly the FCoE traffic carries? Initially we thought it will be used for FC connection to MDS / storage and later only realized this traffic will be pure FC. We dedicated 30% for FCoE, so in our case if it's not utilized, then we can reduce this value and can increase value for VM.

One another Q that you didn't notice is the MTU configured at N9K port level is 9216 and with NIC templates we mentioned 9000 for LM / CSV and 1500 with VM NIC template. Is this a right configuration?

Thank You