TCP segmentation offload (TSO) and vmxnet3/1000v - bug?

sdavids5670 · ‎07-26-2013

NOTE: My knowledge of ESX and the 1000v does not run deep so I do not have a thorough understanding of the relationship / integration between those two components. If anything I'm saying here is out-of-line I apologize in advance.

Yesterday a report came in that an IIS application on a staging server in our test environment was not working (in Internet Explorer it returned "Page cannot be displayed"). The IIS server sits behind an F5 load balancer. Both the F5 and the IIS server are VM guests on a VMware ESX host. Both the IIS server and the F5 had recently been moved to a new environment in which the version of 1000v changed and the vnic driver changed (from E1000 to vmxnet3) and this appeared to be the first time that anybody noticed an issue.

After some digging we noticed something peculiar. The problem only manifested when the IIS server and the F5 were on the same physical host. If they were not co-resident everything worked just fine. After reviewing packet captures we figured out that the "Page cannot be displayed" was not because the content wasn't making it from IIS server to client but rather because the content was gzip compressed and something happened in-transit between the IIS server and the client to corrupt the payload thereby making the gzip decompressible. As a matter of fact, at no time was IP connectivity ever an issue. We could RDP to the IIS server and SSH/HTTP into the F5 without any issues.

I started digging in a little deeper with Wireshark (details of which are included as an attached PDF). It turns out that a bug??? involving TCP segmentation offload (TSO) was causing the payload of the communication to become corrupted. What I'm still trying to figure out is who is responsible for the bug? Is it VMware or Cisco? I'm leaning towards Cisco and the 1000v and this is why.

Referring to the attached PDF, TEST #2 (hosts co-resident) and TEST #3 (hosts not co-resident) show packet captures taken from the IIS box, the 1000v and the F5. Figure 6 shows that the contents of the gzip'd payload can be deciphered by Wireshark as it leaves the IIS box. Figure 8 shows capture data from the 1000v's perspective (spanning rx/tx on the F5 veth port). It's still good at this point. However, figure 10 shows capture data taken on the F5. At some point in time between leaving the egress port on the 1000v and entering the F5 it cannot be decompressed (corrupt data). There is no mention that the TCP checksum failed. In my mind the only way that the data could be corrupt without a TCP checksum failure is if the corruption occurred during the segmentation of the packet. However, if it was due to the guest OS-level vnic driver then why did it still look good to the 1000v egress towards the F5?

The most curious aspect of this whole thing is the behavior I described earlier related to onbox vs. offbox. This problem only occurs when the traffic is switched in memory. Refer to figure's 11 - 16 for capture data that shows the very same test when the F5 and IIS are not co-resident. Is the 1000v (or vnic) savy enough to skip TSO in software and allow the physical NIC to do TSO if it knows that the traffic is going to have to leave the host and go onto the physical wire? That's the only way I can make sense of this difference in behavior.

In any case, here are all of the guest OS-level settings related to offload of any type (along with the defaults) and the one we had to change (in bold) to get this to work with the vmxnet3 NIC:

IPv4 Checksum Offload: Rx & Tx Enabled
IPv4 TSO Offload: From Enabled to Disabled
Large Send Offload V2 (IPv4): Enabled
Offload IP Options: Enabled
Offload TCP Options: Enabled
TCP Checksum Offload (IPv4): Rx & Tx Enabled
UPD Checksum Offload (IPv4): Rx & Tx Enabled

acampbell · ‎07-26-2013

Hi,

Looks like you are hitting this :-

CSCuc64239 Bug Details

Regards,
Alex.
Please rate useful posts.

Regards, Alex. Please rate useful posts.

sdavids5670 · ‎07-27-2013

I don't know. We don't get the "Purple Screen of Death". At no point in time do any of the systems involved (either guests or hosts) crash.

David Grocke · ‎11-12-2013

Hi sdavids5670

Did you ever find a proper fix to your issue? was updating the N1Kv a solution?

I have exactly the same symptoms with a N1Kv [4.2(1)SV2(1.1a)], F5 (ASM), vmxnet3 guests, however I'm failing all RDP (win2k8r2) and SSH large packets (redhat); the rest of my traffic appears fine. This only occurs when the F5 resides on the same VEM or VMware host as you have seen. My packet captures are similar.

My work around is two fold. Firstly create rules to isolate the F5 onto hosts where guests are not utilising it and secondly, disable TCP offloading (I use IPv4 only). Neither of these are solutions.

I have not tried a non-F5 trunk (ie, perhaps a CSR1000v) to replicate this without the F5.

I suspected that the onbox / offbox issue was something specific about the logic of the VEM installed on the host (that's how I justified it to myself) rather than VEM->VEM traffic. It appears that only vEth -> vEth traffic on the same VEM is the issue. Also, I can only replicate this when one of the vEth ports is a trunk. I have yet to be able to replicate this if both are access port-groups (vEths).

I have yet to log a TAC as I wanted to perform testing to exclude the F5.

Thought that I would just ask....

Cheers

David

sdavids5670 · ‎11-13-2013

David,

I wish I could say that we found a permanet fix to the "bug" but once we implemented our workaround (disabling TSO offload) the non-network guys looked at this issue as ultra-low priority. I had to cut the TAC guy loose and close the case because I couldn't pull together the people I needed to setup a proper test environment to delve deeper into it in a timely manner.

Regards,

Steven

David Grocke · ‎11-13-2013

Thanks for the reply, I'll tackle it a little further and post back.

I don't know how to ask politely, but do mind providing any TAC information (PM me if you wish) so my TAC call doesn't have to tread the same water?

I hope it's not an F5 issue.

Thanks

David

sdavids5670 · ‎11-14-2013

David,

I sent you the TAC SR number in a private message. As far as trying to determine whether or not the issue is in the F5 or the 1000v, I would suggest that you setup a test transmission using segment sizes that will force TSO and then capture the traffic at various points along the path and look at where the data becomes corrupted. I don't know if you got a chance to look at the PDF I attached to the OP but I have a pretty detailed set of captures that seem to point the finger at the 1000v. Have you collected packet captures at multiple points in the path yet?

Regards,

Steven

intilop91 · ‎02-21-2014

This is true the calculation of the IP header "total length" field for the encapsulated packet to the VSG is performed incorrectly.

I also agree with this...

TCP Offload IP core