07-22-2014 02:53 AM - edited 03-01-2019 11:45 AM
I've recently gone down the path of testing the installation of UCS VMware 5.5u02 but on the UEFI that ships with the 2.0(1)b firmware bundle. However I've run into a problem with the i350 onboard NICs. I am using the latest Cisco Customized VMware image.
Initially the issue was that VMware would bail out during install, on the basis of it not being able to find a network adapter. This was very repeatable, and I was able at any time to go back to the non-UEFI (legacy) BIOS and install just fine - the NICs would be detected and VMware would be happy. At that stage I had no visibility of the underlying issue.
Having now added a 10G Intel X520 card to the server I have now worked around this - as VMware is able to load and run without a dependency on those ports during installation, and now installs without an issue. This means I've now been able to install VMware and bring it all online.
That has given me the ability to now look into why VMware does not recognise the i350 onboard LOMs. VMware is logging the following error in the vmkernel.log:
1. The LOMs are detected successfully:
0000:01:00.0 Network controller: Intel Corporation I350 Gigabit Network Connection [vmnic0]
0000:01:00.1 Network controller: Intel Corporation I350 Gigabit Network Connection [vmnic1]
2. But the driver fails to load due to the NVRAM issue:
2014-07-22T08:01:28.694Z cpu0:32768)VisorFSTar: 1960: net_igb.v00 for 0x4f57a bytes
2014-07-22T08:01:36.315Z cpu16:33464)Loading module igb ...
2014-07-22T08:01:36.317Z cpu16:33464)Elf: 1861: module igb has license GPL
2014-07-22T08:01:36.320Z cpu16:33464)skb_mem_info mempool for module igb created - max size 23068672
2014-07-22T08:01:36.320Z cpu16:33464)module heap vmklnx_igb: creation succeeded. id = 0x4109a11eb000
2014-07-22T08:01:36.320Z cpu16:33464)PCI: driver igb is looking for devices
2014-07-22T08:01:36.322Z cpu16:33464)<3>igb 0000:01:00.0: (unregistered net_device): PCIe link lost, device now detached
2014-07-22T08:01:40.225Z cpu16:33464)<3>igb 0000:01:00.0: The NVM Checksum Is Not Valid
2014-07-22T08:01:40.225Z cpu16:33464)WARNING: vmklinux: pci_announce_device:1488: PCI: driver igb probe failed for device 0000:01:00.0
2014-07-22T08:01:40.226Z cpu16:33464)<3>igb 0000:01:00.1: (unregistered net_device): PCIe link lost, device now detached
2014-07-22T08:01:44.705Z cpu16:33464)<3>igb 0000:01:00.1: The NVM Checksum Is Not Valid
2014-07-22T08:01:44.705Z cpu16:33464)WARNING: vmklinux: pci_announce_device:1488: PCI: driver igb probe failed for device 0000:01:00.1
2014-07-22T08:01:44.705Z cpu16:33464)PCI: driver igb claimed 0 device
2014-07-22T08:01:44.705Z cpu16:33464)Mod: 4780: Initialization of igb succeeded with module ID 4123.
2014-07-22T08:01:44.705Z cpu16:33464)igb loaded successfully.
If I go back to legacy BIOS with the exact same installation image and hardware, this issue does not exist. So it's very obvious it is related to the UEFI BIOS since that's the only thing that changes between working and non-working states.
I have reflashed the ROMs via HUU and done a complete power cycle of the server with no change to the behaviour. I have also upgraded to version 5.2.5 of the igb driver within VMware post installation and that doesn't help either.
Before I go and log a TAC case on this, has anyone else seen this before? A few people appear to have posted with the exact same problem but no one has ever resolved it, as it's just easier to revert to legacy BIOS where it all works. Which is fine, but that doesn't help get UEFI fixed and functional.
Does anyone have any suggestions as to what else I could try before I log it?
Solved! Go to Solution.
07-22-2014 07:17 AM
Hello Reuben,
The issue you are facing is related to this bug: https://tools.cisco.com/bugsearch/bug/CSCui85699 the current workaround is to downgrading the firmware to 1.4.7 (ucs-c220-huu-1.4.7h) through HUU and performing a complete power cycle which seems to fix the issue.
Currently, the bug is still under investigation, so if you cannot/don't want to downgrade, you may open a case and ask the aforementioned bug to be attached to the case and you will probably be asked for information that can contribute to the investigation or you can simply try the workaround and wait for the bug fix by subscribing yourself to the bug.
You may subscribe to the bug following the next link:
http://www.cisco.com/cisco/support/notifications.html
1-Click on “Add Notification”
2-Under “Notification Name” enter a name you can easily remember, enter the email where you want to receive the notification and hit “Continue”
3-Choose “Track a specific Bug ID” and “Continue”
4-Enter “ Bug Number” and “Continue”
5-Click on “Finish”
Here is a guide for this process.
http://www.cisco.com/web/tsweb/flash/support/ngw/cisco_support_cns.html <<<<<< This is a simple video that will show how to do this and each field within the tool. Might be a little out of date ;)
Rate all helpful answers.
-Kenny
07-22-2014 07:17 AM
Hello Reuben,
The issue you are facing is related to this bug: https://tools.cisco.com/bugsearch/bug/CSCui85699 the current workaround is to downgrading the firmware to 1.4.7 (ucs-c220-huu-1.4.7h) through HUU and performing a complete power cycle which seems to fix the issue.
Currently, the bug is still under investigation, so if you cannot/don't want to downgrade, you may open a case and ask the aforementioned bug to be attached to the case and you will probably be asked for information that can contribute to the investigation or you can simply try the workaround and wait for the bug fix by subscribing yourself to the bug.
You may subscribe to the bug following the next link:
http://www.cisco.com/cisco/support/notifications.html
1-Click on “Add Notification”
2-Under “Notification Name” enter a name you can easily remember, enter the email where you want to receive the notification and hit “Continue”
3-Choose “Track a specific Bug ID” and “Continue”
4-Enter “ Bug Number” and “Continue”
5-Click on “Finish”
Here is a guide for this process.
http://www.cisco.com/web/tsweb/flash/support/ngw/cisco_support_cns.html <<<<<< This is a simple video that will show how to do this and each field within the tool. Might be a little out of date ;)
Rate all helpful answers.
-Kenny
07-22-2014 11:07 PM
Hi Kenny
Unfortunately as I have E5-2600v2 CPUs I can't downgrade to any version prior to 1.5(3) so that's not a valid workaround for me.
I've opened a TAC case to confirm if it's the same bug, and if so will have the case linked to the bug. The notes in that bug don't refer to UEFI or VMware or the error I was seeing in the logs, so we'll have to see if it's the same problem or not. Maybe there are some more internal only notes that I can't see..
Thanks,
Reuben
07-23-2014 07:10 AM
Reuben,
You are right in the last part of your post, that is why I mentioned you are hitting that bug ;)
Don't forget to rate useful answers and mark as correct those ones that solve your issue.
Kenny
07-23-2014 07:13 AM
Thanks for confirming and taking the time to respond Kenny.
Reuben
07-23-2014 07:15 AM
My pleasure Reuben, good luck with your case
-Kenny
08-13-2014 05:37 PM
Looks like the bug changed status to "Fixed" overnight:
https://tools.cisco.com/bugsearch/bug/CSCui85699
No fixed versions states yet but presumably they'll appear soon.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide