cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2079
Views
10
Helpful
6
Replies

UCS C220M3 - and UEFI Boot Failure (Bug?)

Reuben Farrelly
Level 3
Level 3

I've recently gone down the path of testing the installation of UCS VMware 5.5u02 but on the UEFI that ships with the 2.0(1)b firmware bundle.  However I've run into a problem with the i350 onboard NICs.  I am using the latest Cisco Customized VMware image.

Initially the issue was that VMware would bail out during install, on the basis of it not being able to find a network adapter.  This was very repeatable, and I was able at any time to go back to the non-UEFI (legacy) BIOS and install just fine - the NICs would be detected and VMware would be happy.  At that stage I had no visibility of the underlying issue.

Having now added a 10G Intel X520 card to the server I have now worked around this - as VMware is able to load and run without a dependency on those ports during installation, and now installs without an issue.  This means I've now been able to install VMware and bring it all online.

That has given me the ability to now look into why VMware does not recognise the i350 onboard LOMs.  VMware is logging the following error in the vmkernel.log:

1. The LOMs are detected successfully:

0000:01:00.0 Network controller: Intel Corporation I350 Gigabit Network Connection [vmnic0]
0000:01:00.1 Network controller: Intel Corporation I350 Gigabit Network Connection [vmnic1]

2. But the driver fails to load due to the NVRAM issue:

2014-07-22T08:01:28.694Z cpu0:32768)VisorFSTar: 1960: net_igb.v00 for 0x4f57a bytes
2014-07-22T08:01:36.315Z cpu16:33464)Loading module igb ...
2014-07-22T08:01:36.317Z cpu16:33464)Elf: 1861: module igb has license GPL
2014-07-22T08:01:36.320Z cpu16:33464)skb_mem_info mempool for module igb created - max size 23068672
2014-07-22T08:01:36.320Z cpu16:33464)module heap vmklnx_igb: creation succeeded. id = 0x4109a11eb000
2014-07-22T08:01:36.320Z cpu16:33464)PCI: driver igb is looking for devices
2014-07-22T08:01:36.322Z cpu16:33464)<3>igb 0000:01:00.0: (unregistered net_device): PCIe link lost, device now detached
2014-07-22T08:01:40.225Z cpu16:33464)<3>igb 0000:01:00.0: The NVM Checksum Is Not Valid
2014-07-22T08:01:40.225Z cpu16:33464)WARNING: vmklinux: pci_announce_device:1488: PCI: driver igb probe failed for device 0000:01:00.0
2014-07-22T08:01:40.226Z cpu16:33464)<3>igb 0000:01:00.1: (unregistered net_device): PCIe link lost, device now detached
2014-07-22T08:01:44.705Z cpu16:33464)<3>igb 0000:01:00.1: The NVM Checksum Is Not Valid
2014-07-22T08:01:44.705Z cpu16:33464)WARNING: vmklinux: pci_announce_device:1488: PCI: driver igb probe failed for device 0000:01:00.1
2014-07-22T08:01:44.705Z cpu16:33464)PCI: driver igb claimed 0 device
2014-07-22T08:01:44.705Z cpu16:33464)Mod: 4780: Initialization of igb succeeded with module ID 4123.
2014-07-22T08:01:44.705Z cpu16:33464)igb loaded successfully.

If I go back to legacy BIOS with the exact same installation image and hardware, this issue does not exist.  So it's very obvious it is related to the UEFI BIOS since that's the only thing that changes between working and non-working states.

I have reflashed the ROMs via HUU and done a complete power cycle of the server with no change to the behaviour.  I have also upgraded to version 5.2.5 of the igb driver within VMware post installation and that doesn't help either.

Before I go and log a TAC case on this, has anyone else seen this before?  A few people appear to have posted with the exact same problem but no one has ever resolved it, as it's just easier to revert to legacy BIOS where it all works.  Which is fine, but that doesn't help get UEFI fixed and functional.

Does anyone have any suggestions as to what else I could try before I log it?

 

1 Accepted Solution

Accepted Solutions

Keny Perez
Level 8
Level 8

Hello Reuben,

The issue you are facing is related to this bug: https://tools.cisco.com/bugsearch/bug/CSCui85699 the current workaround is to downgrading the firmware to 1.4.7 (ucs-c220-huu-1.4.7h) through HUU and performing a complete power cycle which seems to fix the issue.

Currently, the bug is still under investigation, so if you cannot/don't want to downgrade, you may open a case and ask the aforementioned bug to be attached to the case and you will probably be asked for information that can contribute to the investigation or you can simply try the workaround and wait for the bug fix by subscribing yourself to the bug.

You may subscribe to the bug following the next link:

http://www.cisco.com/cisco/support/notifications.html

 

1-Click on “Add Notification”

2-Under “Notification Name” enter a name you can easily remember, enter the email where you want to receive the notification and hit “Continue”

3-Choose “Track a specific Bug ID” and “Continue”

4-Enter “ Bug Number” and “Continue”

5-Click on “Finish”

 

Here is a guide for this process.

http://www.cisco.com/web/tsweb/flash/support/ngw/cisco_support_cns.html    <<<<<< This is a simple video that will show how to do this and each field within the tool. Might be a little out of date ;)

 

Rate all helpful answers.

-Kenny

View solution in original post

6 Replies 6

Keny Perez
Level 8
Level 8

Hello Reuben,

The issue you are facing is related to this bug: https://tools.cisco.com/bugsearch/bug/CSCui85699 the current workaround is to downgrading the firmware to 1.4.7 (ucs-c220-huu-1.4.7h) through HUU and performing a complete power cycle which seems to fix the issue.

Currently, the bug is still under investigation, so if you cannot/don't want to downgrade, you may open a case and ask the aforementioned bug to be attached to the case and you will probably be asked for information that can contribute to the investigation or you can simply try the workaround and wait for the bug fix by subscribing yourself to the bug.

You may subscribe to the bug following the next link:

http://www.cisco.com/cisco/support/notifications.html

 

1-Click on “Add Notification”

2-Under “Notification Name” enter a name you can easily remember, enter the email where you want to receive the notification and hit “Continue”

3-Choose “Track a specific Bug ID” and “Continue”

4-Enter “ Bug Number” and “Continue”

5-Click on “Finish”

 

Here is a guide for this process.

http://www.cisco.com/web/tsweb/flash/support/ngw/cisco_support_cns.html    <<<<<< This is a simple video that will show how to do this and each field within the tool. Might be a little out of date ;)

 

Rate all helpful answers.

-Kenny

Hi Kenny

Unfortunately as I have E5-2600v2 CPUs I can't downgrade to any version prior to 1.5(3) so that's not a valid workaround for me.

I've opened a TAC case to confirm if it's the same bug, and if so will have the case linked to the bug.  The notes in that bug don't refer to UEFI or VMware or the error I was seeing in the logs, so we'll have to see if it's the same problem or not.  Maybe there are some more internal only notes that I can't see..

Thanks,
Reuben

 

Reuben,

You are right in the last part of your post, that is why I mentioned you are hitting that bug ;)

Don't forget to rate useful answers and mark as correct those ones that solve your issue.

 

Kenny

Thanks for confirming and taking the time to respond Kenny.

Reuben

 

My pleasure Reuben, good luck with your case  wink

 

-Kenny

Looks like the bug changed status to "Fixed" overnight:

https://tools.cisco.com/bugsearch/bug/CSCui85699

No fixed versions states yet but presumably they'll appear soon.

Review Cisco Networking products for a $25 gift card