With FlexPod QoS, Start with Best Effort for Best Performance

jogeorg2 · ‎10-18-2017

The Quality of Service (QoS) setup in FlexPod Datacenter has always been done with simplicity in mind and as a base for more advanced configuration when necessary. Because the FlexPod IP data network uses Cisco UCS Fabric Interconnects (FIs) and Cisco Nexus switches, QoS has three main components: Class of Service (CoS) marking, QoS Queuing, and data packet Maximum Transmission Unit (MTU) for Jumbo Frames support. The base FlexPod setup uses the Best Effort CoS (value 0) for all IP-based data traffic, with the capability of additional traffic classes as needed. The FlexPod UCS QoS configuration is shown below.

From an IP traffic perspective, only the Best Effort Class or Priority is enabled here and all the other classes are not enabled. The Fibre Channel class covers FC/FCoE, but is considered a different network. If FC/FCoE will not ever be implemented in this UCS system, the Fibre Channel class cannot be disabled, but you can set the weight to "none" to effectively disable the class. In this setup, even if a data packet passes into the UCS with a CoS value other than 0, it will be treated as Best Effort within the UCS fabric. Only Best Effort queuing is taking place. CoS marking in UCS takes place at the Virtual Network Interface (vNIC), by assigning a QoS policy to the vNIC. If a QoS policy is not selected, the vNIC will be assigned a CoS value of 0 (Best Effort) by default. This CoS marking is at the vNIC level and vNIC granularity. If VLAN tagged trunks are used and multiple VLANs come into a vNIC, UCS CoS marking will mark all packets with the same CoS, regardless of VLAN. Cisco UCS does provide a mechanism for passing CoS marked packets from a virtual switch or Operating System (OS) into the UCS fabric without re-marking them. This mechanism will be described below.

The only difference between the FlexPod base setup and the default UCS QoS setup is that the MTU for the Best Effort class has been changed from normal (1500) to 9216 (Jumbo). In FlexPod, Jumbo Frame (MTU 9000) packets are normally used for IP-based storage protocols (NFS, iSCSI, and SMB when VM connectivity to the SMB share is not required), but management packets and any packets that leave the pod normally have an MTU of 1500. MTU is integrated with QoS in UCS. The default FlexPod setup in relation to MTU and Jumbo Frames uses a method where the network (both the UCS FIs and the Nexus switches) is capable of negotiating an MTU up to 9000 at the endpoint. The reason the MTU of the network devices is set to 9216 instead of 9000 is to account for encapsulation where certain protocols add bytes to the packet header and increase the size of the packet. The endpoints negotiate the MTU independently, not end-to-end, and it is critical that all endpoints within a subnet have the same MTU setting. It is often stated that all endpoints within a subnet need to have the same MTU setting to avoid fragmentation, but the behavior is actually worse here. The following is quoted from the Cisco Nexus 9000 Series Interfaces Configuration Guide: "For transmissions to occur between two ports, you must configure the same MTU size for both ports. A port drops any frames that exceed its MTU size. By default, each port has an MTU of 1500 bytes, which is the IEEE 802.3 standard for Ethernet frames." In FlexPod, an MTU mismatch will cause packet drops and not fragmentation. Fragmentation only occurs on L3 interfaces.

In the Nexus 9000 series switches used in FlexPod, MTU is set at the port L2 interface level. Port MTU is set to 9216 for ports connecting to both the storage controllers and the UCS FIs. Endpoints connected to these ports can negotiate an MTU of either 1500 or 9000. The queuing and setup in the Nexus 9000 series in FlexPod is default and all traffic passes through the Best Effort queue. Also, no CoS marking is done by default in the Nexus 9000 series and any CoS packet markings are passed through.

If network congestion in a FlexPod is an issue and traffic needs to be prioritized to avoid packet drops of priority traffic, additional classes in UCS can be enabled along with QOS policies at the vNIC level to mark packets with the appropriate CoS value. It is also critical that corresponding CoS marking and queuing configuration be added to the Nexus switches to avoid potential issues. Cisco UCS performs both input and output queuing. The Nexus switch output queuing should be set to match the UCS queuing to avoid packet drops when entering UCS.

QoS is often implemented where different VLANs have different priorities and CoS values, but with most hypervisors VLAN trunks are used and the uplink vNICs have multiple VLANs with different desired priorities as mentioned above. Since UCS QoS marking is normally done at the vNIC level, a UCS QOS policy can be setup and assigned to a vNIC with the "Host Control" parameter set to Full. This policy enables CoS parameters to be set in the server OS or virtual switch and passed into and through the UCS fabric. In the example screenshot, CoS parameters that are marked are passed through, but if the CoS parameter is not set, the traffic is marked as Best Effort (CoS 0). We have discovered that when the vSphere 6.5 vDS is used, the current nenic VIC driver does not correctly pass through CoS marking, but an upcoming nenic driver will resolve this issue.

A potential issue in the interaction between Cisco UCS and NetApp ONTAP-based storage systems in FlexPod has been identified. It has been verified that NetApp ONTAP trunked VLAN network interfaces set a CoS value of 4 in the packet VLAN header. This means that all IP-based storage (NFS, SMB, iSCSI) and management packets on a tagged VLAN interface have CoS 4 set in the VLAN header. The Nexus switches in a base FlexPod configuration will simply pass this CoS setting through without changing or re-marking it, and will treat the traffic as Best Effort from a queuing perspective. In FlexPod, packets with this CoS setting will be passed through the switch into the UCS Fabric Interconnect (FI), but in the base FlexPod setup, these packets will be treated as Best Effort since this is the only class enabled in UCS. This setting can be verified using port mirroring that does not strip off the VLAN header and a packet sniffer such as Cisco vNAM. The following screenshot, which is from a mirror of an outgoing stream (TX) to a FI from a Nexus 93180YC-EX switch, demonstrates this setting. The source IP of the packet shown is from a NetApp AFF A300 running ONTAP 9.1.

In this setup, potential problems can arise if the Gold class (CoS 4) is enabled in the UCS and the MTU for the Gold class is left at normal (1500). All storage packets coming into the FIs from the switches would have a CoS of 4 and would be assigned to the Gold class. If the Gold class MTU is set to 1500, all packets coming from storage (MTU negotiated at the Nexus switch port) that are larger than 1500 bytes would be dropped and Jumbo Frames would be broken for these packets. This problem can be resolved by either setting the MTU of the Gold class in UCS to 9216 or by implementing the workaround below.

This issue is just one example of potential implications of using QoS in a Converged Infrastructure pod. Because of this, my recommendation is to always implement QoS in a holistic, end-to-end fashion. The switch setup of QoS must be considered along with the setup of any virtual switches in a virtual environment. It is important that any QoS setup done in the UCS (marking or queueing) also be mirrored in the switches. The UCS does QoS marking only at the vNIC level, and not on the uplink side. If QoS policies are enabled only in the UCS and not in the switches, it is possible for network traffic to be treated differently in different directions. For example, with the NetApp storage traffic shown here, in the direction from the storage controller the traffic would be treated as Gold traffic in the UCS when the Gold class is enabled. If this same traffic is marked as Bronze in the vNIC template, then it would be treated as Bronze in the direction toward the storage controller. Note that if no QOS queuing configuration is put into the switches, this traffic would be treated as Best Effort in the switches. This kind of configuration would not likely provide the desired performance result under congestion.

A potential workaround to the issue where the Gold class is enabled is to configure the Nexus 9K switches to re-mark the packets coming from the NetApp storage controllers into the Best Effort (CoS 0) class. The 9K, by default, has port-based QoS in place. To do this re-marking with port-based QoS, a QoS class-map must be setup to classify the packets:

class-map type qos match-any NetApp-CoS-4

match cos 4

Then a policy-map must be put in place to re-mark the packets:

policy-map type qos Mark-NetApp-Best-Effort

class NetApp-CoS-4

set cos 0

Finally, this policy-map needs to be assigned to the interface ports attached to the storage controllers:

interface port-channel xxx, port-channel yyy

service-policy type qos input Mark-NetApp-Best-Effort

Note that in this case, this QoS policy is only applied to the interfaces connected to the NetApp storage controllers and would have no effect on any other interfaces, including the interfaces connected to the USC FIs. Once this policy is implemented, the packet capture of packets transmitted to the FI now shows a CoS setting of 0 (Best Effort) on storage packets.

Note that this workaround should be part of a comprehensive, end-to-end QoS configuration, and that extensions to this workaround would be needed in many cases. In FlexPod, with any QoS deployment outside of the base Best Effort setup, testing should be done to show that the QOS implementation provides the desired results during network congestion.

With FlexPod QoS, Start with Best Effort for Best Performance

How to Boot ESXi on UCS-M2-HWRAID Boot-Optimized M.2 RAID Controller

UCS Performance Manager 2.0 - New Version Released

Low cost, small footprint solution for high-density XenApp workloads