05-02-2018 02:18 PM - edited 03-01-2019 01:32 PM
Several UCS C220 M4 and C240 M5XS servers.
VMware Cisco specific iso ESXi 6.0 U3 install. No updates applied.
CAT 6 cabling
Cisco Nexus 3172TQ (two switches - primary and secondary)
Making use of both on-board10Gb adapters and on added PCIe adapters
SR open
I have never had so much trouble simply getting the basic network to work.
All the 10Gb connections are connecting at 1Gb. I can see this from the amber light on the server's port. The NetAdmin has confirmed this from his end on the Nexus 3172TQ and I can see the speed in ESXi.
One Very strange thing though. By unplugging and plugging in the network cables on the back of the servers, sometimes a couple times in a row, I get a 10Gb connection. I got the majority to connect at 10Gb, but not all of them. This is not consistent. If I restart the server the links go back to 1Gb.
I also tested by using a connection from another rack's Nexus 3172TQ from another ESXi server and the resulting connection was not consistent. The first time I plugged in that cable it went to 10Gb. I unplugged and plugged in again and ... 1Gb. Returning the network cable to the original ESXi host went back to 10Gb without any problems.
I have a SR open but if anyone has any ideas, I would very much appreciate it.
Thanks
Chris H.
Solved! Go to Solution.
05-02-2018 06:56 PM - edited 05-02-2018 07:23 PM
Hi Chris.
That sounds like either a driver or Intel NIC firmware issue.
If you hardcode the speed (vswitch, network adapters, vmnic properties) to 10Gb, does it consistently stay at 10Gb if you do a series of cable reseats?
You might want to try a couple of linux (or any OS) on a stick type ISOs that have a recent IXGBE driver embedded, to see if you get different results with different OS/driver.
In Linux or ESXi try running the following:
ethtool -s ethxyz advertise 0x1000
to see if restricting the advertised speed capabilities has a positive impact.
I'm aware of one other case where a customer running windows 2016 had similar issues when using the HCL based Intel driver, but worked fine when using default inbox driver provided by Microsoft.
Send me a message with your case #.
Thanks,
Kirk...
05-02-2018 06:56 PM - edited 05-02-2018 07:23 PM
Hi Chris.
That sounds like either a driver or Intel NIC firmware issue.
If you hardcode the speed (vswitch, network adapters, vmnic properties) to 10Gb, does it consistently stay at 10Gb if you do a series of cable reseats?
You might want to try a couple of linux (or any OS) on a stick type ISOs that have a recent IXGBE driver embedded, to see if you get different results with different OS/driver.
In Linux or ESXi try running the following:
ethtool -s ethxyz advertise 0x1000
to see if restricting the advertised speed capabilities has a positive impact.
I'm aware of one other case where a customer running windows 2016 had similar issues when using the HCL based Intel driver, but worked fine when using default inbox driver provided by Microsoft.
Send me a message with your case #.
Thanks,
Kirk...
05-04-2018 07:26 AM
Kirk,
Thank you very much! I'll try a couple commands from ESXi CLI.
My case number is 6844044192.
One thing I should have done from the get-go was to run Cisco HUU for the 220 and 240, which I finally did yesterday when the support technician mentioned it. Unfortunately that did not resolve the issue.
Again. Thanks!
05-04-2018 08:54 AM - edited 05-04-2018 01:22 PM
Greetings.
So wanted to mention drivers are not part of HUU.
There is a different ISO for drivers, see https://software.cisco.com/download/home/286281356/type/283853158/release/3.0%25284%2529
Make sure you have tried the latest HUU and the latest drivers. The Vmware drivers will actually come from VMware site: for example x550 driver for esxi 6.0 https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI60-INTEL-IXGBE-451&productId=491
You normally need to use the HCL to determine the correct HUU/firmware and driver to use, but in this case it sounds like we may have an issue with the IXGBE driver, so am recommending to get a little creative on trying to test some different driver versions.
Thanks,
Kirk...
05-10-2018 08:19 AM
Kirk, thanks again.
I ran the correct HUU on all the boxes (C220 and C240) and then the correct VMware supported updates for the hardware drivers.
These two steps did not resolve the issue so I finally wound up setting the speed in ESXi to 10Gb for those NIC capable of that. The NetAdmin did his part on his end.
Thanks!
05-10-2018 08:58 AM
I suspect this will end up being a driver fix that is yet to be released by Intel.
There are other reports from other vendors using this same chipset, with similar issues.
Thanks,
Kirk...
05-04-2018 02:16 PM
Finally hardcoded the speed on both ends and it is consistent. Doesn't change!
But I'll be running speed tests next week.
Thanks for the ethtool information!
Chris
05-04-2018 09:52 AM
Not sure if this comes into play, but CAT 6 will support 10G, but only at maximum distances of 120-180 feet. A lot would depend on the quality of the installation and overall conditions. It may explain why it works in some cases and not in others.
Hope this helps
05-04-2018 02:11 PM
Thanks!
The servers are in the same rack as the two Nexus 3172TQ switches are installed. So it's like 3 meters.
What we wound up doing was to set the NICs/Ports to 10Gb Full Duplex on both ends.
I'll be testing actual speed next week.
Chris
05-10-2018 08:43 AM
Here is another interesting like associated to the same problem:
https://www.reddit.com/r/vmware/comments/5txnjf/intel_x550t2_nic_in_60_u2_not_working_at_10gbps/
Chris H.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide