jmbraben2
Contributor
Contributor

ESXi 7.0.1 'DevicePowerOn' power on failed with Nvidia Quadro P400 passthrough

I've updated from 6.7.0u2 to 7.0.1

(HPE-ESXi-6.7.0-Update2 to (Updated) HPE-Custom-AddOn_701.0.0.10.6.0-40)

In an attempt to be able to pass through an Nvidia Quadro P400 to Linux guest.

In 6.7.0u2, Linux could see the GPU, but not "correctly" (nividia-smi not reporting card, but shows up with lspci)

I've read about the passthrough changes in 7.0 and have created a new ESXi 7.0 U1 virtual machine from scratch.

The GPU is marked as active passthrough in the host hardware

pastedImage_7.png

The VM is configured with UEFI bios

pastedImage_5.png

I've added the GPU as a "Dynamic PCI device"

pastedImage_2.png

Although does seem odd at top level to see:

pastedImage_3.png

I've reserved all the guest OS memory

pastedImage_4.png

This GPU only has 2Gb...so nothing fancy there in memory allocations.

This card is max 30 watts and no "extra" power connections.

Since trying this on 7.0.1 I cannot even start the VM as it always fails with:

"Power on failure messages: Module 'DevicePowerOn' power on failed."

I see nothing in the logs that would explain the failure.

2020-10-22T23:13:21.121Z| vmx| I005: DICT pciPassthru0.allowedDevices = "0x10de:0x1cb3,0x10de:0xfb9"

...

2020-10-22T23:13:21.410Z| vmx| I005+ Power on failure messages: Module 'DevicePowerOn' power on failed.

2020-10-22T23:13:21.410Z| vmx| I005+ Failed to start the virtual machine.

2020-10-22T23:13:21.410Z| vmx| I005+

2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4200]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

2020-10-22T23:13:21.411Z| vmx| I005: Transitioned vmx/execState/val to poweredOff

2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4200]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0

2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4238]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error

2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4200]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0

2020-10-22T23:13:21.411Z| vmx| I005: Transitioned vmx/execState/val to poweredOff

As soon as I remove the GPU from the VM, the VM will power up without issue.

I feel I have to be missing something obvious as this "should work", but I'm currently at a loss.

I've attached the log and vmx files.

TIA for any help.

9 Replies
jmbraben2
Contributor
Contributor

Pulled the video card and made sure it was functional (it is)...anyone's thoughts on this would be greatly appreciated.

0 Kudos
BMWAdriano
Contributor
Contributor

I'm having the exact same problem with Nvidia Grid K2 in passthrough. When removing the PCI device, the VM boots up just fine. Other then the 6.7 to 7.0 upgrade, nothing else was changed and everything work flawlessly in 6.7.  I tried everything possible and worst of all, we can't rollback to 6.7. What a mess!

ashilkrishnan
VMware Employee
VMware Employee

Hi @BMWAdriano ,

Please check if this helps --> https://kb.vmware.com/s/article/67587 

0 Kudos
VMdicker
Contributor
Contributor

No, unfortunately, it does not work!

I think in the original post it was already shown that passthrough has been enabled on the esxi host.

0 Kudos
VMdicker
Contributor
Contributor

@jmbraben2 have you been able to find out any solution?

0 Kudos
BMWAdriano
Contributor
Contributor

Thank you but the information you have provided is only basic documentation of how to initialize pass-thru. I have done this already multiple times on 6.5, 6.7 with no issue and now on 7.1 it does not function correctly. Unfortunately this information does not help and we have given up on 7.1 and reverted back to 6.7 which works without issue.

0 Kudos
VMdicker
Contributor
Contributor

@BMWAdriano  Is there any convenient way to roll back to 6.7? 

 

0 Kudos
VMdicker
Contributor
Contributor

@ashilkrishnan This problem seems to be quite different from similar cases you can find on the web.

In other cases, we could at least find "device blah blah does not exist". 

However, we could not find any useful information in the log.

0 Kudos
BMWAdriano
Contributor
Contributor

Hello VMdicker,

Unfortunately upgrading to v7 does not have a simple "rollback" as did 6.7 to earlier versions (from my understanding). I ended up reinstalling 6.7 from scratch. Luckily my datastores stayed intact so it wasn't a complete loss.

0 Kudos