Solved: Re: QoS issues

tohoken · ‎01-14-2011

All,

I have setup our QOS system following Brad Hedlund's excellent tutorials. However, it doesn't appear to be working for me. I have defined the Gold class with a weight of 4 and a MTU of 9216. I then created a QoS policy using the gold class with a burst of 10240 and line-rate for the rate. Host control is set to none. I then apply the policy to my service console and set my MTU to 9000. I experience problems with both ESXi and Windows 2008 R2. I can ping another host with a standard frame size with no problems. If I ping with a packet size of 8000 I get a "Request timed Out" error. However, if I change the MTU of the best-effort class to 9216 it pings with the jumbo packet just fine. If I change the best-effort back to normal, jumbo pings time out. It appears that the policy is not being applied, has an error, or the system is defaulting to the best-effort class and not using the policy I created. Any ideas where to begin my troubleshooting?

Thanks,

Ken

Manish Tandon · ‎01-16-2011

Ken

If it works within UCS and not beyond then thats explainable which is why I was asking you abt the 2nd server.

In UCS we follow CoS based QoS i.e on the basis of CoS treatment is met.

When you have 2 blades within UCS, you have the same QoS policy defined i.e jumbo frames are allowed on CoS 4 (as example).

When you go beyond UCS, the QoS policy is not there.

When you initiate a ping of packet size 9000 lets say, the ICMP echo goes out with packet size (data) as 9000.

The return (Echo reply) also is with packet size 9000. The return needs to work for ping to be successful.

When you ping to the external server, the packet reaches there just fine I am sure as the QoS policy marks it with CoS 4 and jumbo frames are enabled.

The return is coming on CoS 0 (i.e unset) which goes to best-effort is still 1500 on the UCS side and will be stomped.

That is why you need to turn on jumbo on the best-effort class i.e to allow the return packet and the treatment to be given to it within UCS.

We do not allow marking in UCS on ingress from uplinks. But if you were to mark the traffic upstream with CoS 4, this will work.

You can open a TAC case too ..but am sure the explanation will be the same.

Thanks

--Manish

View solution in original post

work · ‎01-14-2011

Ken,

I noticed the same thing with my iSCSI QoS assignment : ie until the best effort policy supports jumbo frames, the others wont either. [this caused me some major headaches trying to connect my ESX hosts to the EMC storage!]

having said that (i havent had a chance to try this yet) maybe you try the following:

create a service profile with gold and best effort sharing a single port on a single fabric. set gold to have a MUCH higher weight.

then use iperf or some other traffic generation tool to blast both vNICs and that should be able to confirm if the QoS weights are being applied.

Manish Tandon · ‎01-14-2011

Ken

You will have to specific which adapter in question here as the way QoS behaves is largely dependent on that.

Host Control (Full or None) *only* applies to VIC(Palo). This setting has no effect on any other adapter like Menlo.

The behavior that you mentioned points towards it being a Menlo. Confirmation would be good.

If you have a vswitch with Menlo, the behavior that you are seeing is expected i.e Menlo will not mark traffic if a packet received by it from the host already has a dot1q header (typical of a vswitch vmnic with multiple portgroups/vlans defined). It will let it pass as it came in i.e with the dot1q header which contains the CoS value as is (which is 0 in this case ). Thats why it goes into best-effort class which matches CoS 0 and you need to set it to MTU 9000.

If you had a Nexus1000v, you could have marked the traffic at the veth(port group) level with CoS 5 for example and Menlo would have passed it and the system class if enabled would have given it the treatment (MTU in this case).

If a packet comes to Menlo *without* a dot1q tag (if its an OS like Windows/Linux running on bare metal), Menlo *will* set the CoS to whatever the QoS policy dictates along/within the dot1q header (which is mandatory within UCS).

You could try this as you mentioned Windows (assuming its on bare metal) - keep soft switch out for the length of a test.

You have 2 blades with Menlo and Windows or Linux running natively on them

a) Modify system class to enable Cos 5 platinum with MTU 9000. Keep best-effort at normal 1500.

b) Create QoS policy and specify Priority as Platinum (If its a Menlo, rate/burst/host control don't apply).

c) Create vNIC in the SP and reference the above policy

d) Install OS. Change MTU on interface to 9000.

I am pretty sure ping with packet size of 9000 will work amongst them irrespective of what your best-effort class has for MTU as packets will hit the fabric with CoS 5.

Palo behaves differently than above.

Thanks

--Manish

tohoken · ‎01-15-2011

Manish,

Thank you for the excellent explanation. However, I am using a Palo card and have presented a single NIC to server 2008 R2. The OS is installed directly on the blade. I have the same problem with ESXi 4.1. I figured it would be easier to troubleshoot with windows and a single NIC.

Ken

Sent from Cisco Technical Support iPhone App

Manish Tandon · ‎01-15-2011

Ken

With Palo it should work and it should do the CoS marking.

You mentioned you change class-default to 9000 and it works.

If it does, then it doesn't point to the OS itself i.e MTU not set correctly.

This is being tested between 2 Windows Server running natively or one side is ESX and one is Windows?

I could screenshot the settings for you for bare metal (Windows) if that helps.

--Manish

tohoken · ‎01-15-2011

Manish,

Please provide the screenshots your talking about as I think I have all the settings correct.

Ken

Manish Tandon · ‎01-15-2011

Ken

You sure you have "enabled" the priority in the QoS System Class section?

Screenshots attached.

Thanks

--Manish

tohoken · ‎01-15-2011

Manish,

I verified all my settings per your attachment. I am attaching my screenshots in a Word document for you to look at.

Ken

Manish Tandon · ‎01-15-2011

What are you trying to ping?

Where is the 2nd server? What policy etc does it have?

--Manish

tohoken · ‎01-15-2011

I am pinging another physical windows server that is on the same vlan. I get the same results if I initiate the ping from the 2nd server.

Ken

Sent from Cisco Technical Support iPhone App

Manish Tandon · ‎01-15-2011

Ken

I believe we will need to look at it via a webex or something.

I could ask you for some show commands but don't want show techs etc attached to threads here..

Can you open a TAC case?

The screenshot that I sent you is from a working setup for 2 blades running Windows/Linux so it works.

Just the question of seeing what is different on your setup.

Thanks

--Manish

tohoken · ‎01-16-2011

Manish,

I will open a TAC case tomorrow. Thanks for your help. I have done a little more troubleshooting since our last exchange. With the policy settings I can ping another blade server within the UCS chassis. I cannot ping a server outside of the UCS chassis with Jumbo frames unless I modify the best effort class. I run two Fabric Interconnects that are connected to two Nexus 5548 switches using VCP and the server is connected to a FEX that is connected to the Nexus switches. The servers are in the same VLAN so the traffic should be switched by Nexus and not have to travel farther to an upstream switch. Jumbo frames are enabled on all switches. I know Nexus is set correctly or else it wouldn't allow the jumbo pings through when I change the best effort class to allow jumbo.

Ken

Manish Tandon · ‎01-16-2011

Ken

If it works within UCS and not beyond then thats explainable which is why I was asking you abt the 2nd server.

In UCS we follow CoS based QoS i.e on the basis of CoS treatment is met.

When you have 2 blades within UCS, you have the same QoS policy defined i.e jumbo frames are allowed on CoS 4 (as example).

When you go beyond UCS, the QoS policy is not there.

When you initiate a ping of packet size 9000 lets say, the ICMP echo goes out with packet size (data) as 9000.

The return (Echo reply) also is with packet size 9000. The return needs to work for ping to be successful.

When you ping to the external server, the packet reaches there just fine I am sure as the QoS policy marks it with CoS 4 and jumbo frames are enabled.

The return is coming on CoS 0 (i.e unset) which goes to best-effort is still 1500 on the UCS side and will be stomped.

That is why you need to turn on jumbo on the best-effort class i.e to allow the return packet and the treatment to be given to it within UCS.

We do not allow marking in UCS on ingress from uplinks. But if you were to mark the traffic upstream with CoS 4, this will work.

You can open a TAC case too ..but am sure the explanation will be the same.

Thanks

--Manish

tohoken · ‎01-17-2011

Manish,

Thank you for all your help. I understand what is happening now. We will be using the Cisco 1000v switch in our ESXi servers so do you recommend we mark the ingress traffic on the 1000v or just change the best effort class? Are there any potential issues with changing the best effort class to "jumbo"?

Ken

Manish Tandon · ‎01-17-2011

Ken

Not sure what you mean by marking traffic on Nexus1000v on ingress.

This Nexus 1000v is on UCS too? If yes, thats not buying you anything as its at the host level which is receiving the packet.

If its on the host outside UCS, then you need to mark it on Nexus 1000v egress so that when UCS receives it on the uplinks, the packet is already marked with the correct CoS.

For Nexus 1000v, it is recommended to mark traffic at the veth (port group) level on egress i.e set Host Control to Full with VIC (Palo).

Gives you a centralized place across platforms where Nexus 1000v is running instead of doing it on adapters (which means you will need multiple adapters if there are multiple port groups and they need to get marked differently).

Setting the best-effort to Jumbo has no performance implications etc so no issues. Just specifies what packet size is allowed in a class.

Thanks

--Manish