cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1421
Views
0
Helpful
3
Replies

UCS c240M3L server with Nvidia Grid K2 GPU not loading driver on ESXi 5.5

hdiep
Level 1
Level 1

I could not get ESXi 5.5 to work properly with sVGA mode because I could not get the NVidia driver to start properly.

I can see I loaded the driver manually (version 3.45.69).  

---

 # esxcli software vib list | grep NVIDIA
NVIDIA-VMware_ESXi_5.5_Host_Driver  346.69-1OEM.550.0.0.1331820            NVIDIA  VMwareAccepted    2015-07-13

 

---

 

/etc/init.d/xorg start will not start with vmkernel.log getting this error.

 

----

2015-07-13T16:29:03.875Z cpu5:37740)module heap: Initial heap size: 8388608, max heap size: 68476928
2015-07-13T16:29:03.875Z cpu5:37740)vmklnx_module_mempool_init: Mempool max 68476928 being used for module: 4192
2015-07-13T16:29:03.875Z cpu5:37740)vmk_MemPoolCreate passed for 2048 pages
2015-07-13T16:29:03.875Z cpu5:37740)module heap: using memType 2
2015-07-13T16:29:03.875Z cpu5:37740)module heap vmklnx_nvidia: creation succeeded. id = 0x411245757000
NVRM: vmk_MemPoolCreate passed for 2129919 pages.
NVRM: No NVIDIA graphics adapter found!
2015-07-13T16:29:04.126Z cpu5:37740)nvidia failed to load.
2015-07-13T16:29:04.126Z cpu5:37740)WARNING: Elf: 2822: Kernel based module load of nvidia failed: Failure <Mod_LoadDone failed>
2015-07-13T16:29:06.518Z cpu27:34906)World: 14299: VC opID HB-host-127@76-52671c2a-27 maps to vmkernel opID 1d966baf

----

 

 Running command nvidia-smi gets this error.

-----
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

-----

 

I can see the Nvidia hardware on ESXi

---

# esxcli hardware pci list -c 0x300 -m 0xff
000:009:00.0
   Address: 000:009:00.0
   Segment: 0x0000
   Bus: 0x09
   Slot: 0x00
   Function: 0x00
   VMkernel Name:
   Vendor Name: Matrox Electronics Systems Ltd.
   Device Name: MGA G200e [Pilot] ServerEngines (SEP1)
   Configured Owner: Unknown
   Current Owner: VMkernel
   Vendor ID: 0x102b
   Device ID: 0x0522
   SubVendor ID: 0x1137
   SubDevice ID: 0x0101
   Device Class: 0x0300
   Device Class Name: VGA compatible controller
   Programming Interface: 0x00
   Revision ID: 0x02
   Interrupt Line: 0x0b
   IRQ: 11
   Interrupt Vector: 0x34
   PCI Pin: 0x05
   Spawned Bus: 0x00
   Flags: 0x0221
   Module ID: -1
   Module Name: None
   Chassis: 0
   Physical Slot: 0
   Slot Description:
   Passthru Capable: true
   Parent Device: PCI 0:0:28:7
   Dependent Device: PCI 0:9:0:0
   Reset Method: Link reset
   FPT Sharable: true

000:085:00.0
   Address: 000:085:00.0
   Segment: 0x0000
   Bus: 0x85
   Slot: 0x00
   Function: 0x00
   VMkernel Name:
   Vendor Name: NVIDIA Corporation
   Device Name: GK104GL [GRID K2]

   Configured Owner: Unknown
   Current Owner: VMkernel
   Vendor ID: 0x10de
   Device ID: 0x11bf
   SubVendor ID: 0x10de
   SubDevice ID: 0x100a
   Device Class: 0x0300
   Device Class Name: VGA compatible controller
   Programming Interface: 0x00
   Revision ID: 0xa1
   Interrupt Line: 0x0b
   IRQ: 11
   Interrupt Vector: 0x3c
   PCI Pin: 0x0e
   Spawned Bus: 0x00
   Flags: 0x0201
   Module ID: -1
   Module Name: None
   Chassis: 0
   Physical Slot: 255
   Slot Description: Chassis slot 5:01.00
   Passthru Capable: true
   Parent Device: PCI 0:132:8:0
   Dependent Device: PCI 0:133:0:0
   Reset Method: Bridge reset
   FPT Sharable: true

000:086:00.0
   Address: 000:086:00.0
   Segment: 0x0000
   Bus: 0x86
   Slot: 0x00
   Function: 0x00
   VMkernel Name:
   Vendor Name: NVIDIA Corporation
   Device Name: GK104GL [GRID K2]

   Configured Owner: Unknown
   Current Owner: VMkernel
   Vendor ID: 0x10de
   Device ID: 0x11bf
   SubVendor ID: 0x10de
   SubDevice ID: 0x100a
   Device Class: 0x0300
   Device Class Name: VGA compatible controller
   Programming Interface: 0x00
   Revision ID: 0xa1
   Interrupt Line: 0x0b
   IRQ: 11
   Interrupt Vector: 0x3c
   PCI Pin: 0x00
   Spawned Bus: 0x00
   Flags: 0x0201
   Module ID: -1
   Module Name: None
   Chassis: 0
   Physical Slot: 255
   Slot Description: Chassis slot 5:02.00
   Passthru Capable: true
   Parent Device: PCI 0:132:16:0
   Dependent Device: PCI 0:134:0:0
   Reset Method: Bridge reset
   FPT Sharable: true

------

 

I uploaded to the latest 2.0(6) HUU package (updated all) and tried downgraded to 2.0(1) HUU package with no success.

 

I have 2 UCS c240m3s with 2.0(1b) firmware loaded that is working properly with the same Nvidia ESXi 5.5 driver.  Only this one new c240m3L server with this issue.  

 

Hope someone has seen this behavior before.

 

1 Accepted Solution

Accepted Solutions

This is U2 with the Custom Cisco build.

I got this working now.  I used stateless auto-deployment (PXE boot) and for whatever reason, the Deploy Rule did not cache the profile image properly, so the Nvidia vib was not installed.  Installed manually to test, but looks like restarting xorg was not successful.

I setup another Deploy Rule, it took cache the profile image and vib is installed at bootup now.

View solution in original post

3 Replies 3

jchavesd
Level 1
Level 1

Hdiep,

       Which ESXi 5.5 version are you running? U1 or U2?

 

 

This is U2 with the Custom Cisco build.

I got this working now.  I used stateless auto-deployment (PXE boot) and for whatever reason, the Deploy Rule did not cache the profile image properly, so the Nvidia vib was not installed.  Installed manually to test, but looks like restarting xorg was not successful.

I setup another Deploy Rule, it took cache the profile image and vib is installed at bootup now.

Great to hear that it's was fixed. 

 

 

Review Cisco Networking for a $25 gift card

Review Cisco Networking for a $25 gift card