I've updated from 6.7.0u2 to 7.0.1
(HPE-ESXi-6.7.0-Update2 to (Updated) HPE-Custom-AddOn_701.0.0.10.6.0-40)
In an attempt to be able to pass through an Nvidia Quadro P400 to Linux guest.
In 6.7.0u2, Linux could see the GPU, but not "correctly" (nividia-smi not reporting card, but shows up with lspci)
I've read about the passthrough changes in 7.0 and have created a new ESXi 7.0 U1 virtual machine from scratch.
The GPU is marked as active passthrough in the host hardware
The VM is configured with UEFI bios
I've added the GPU as a "Dynamic PCI device"
Although does seem odd at top level to see:
I've reserved all the guest OS memory
This GPU only has 2Gb...so nothing fancy there in memory allocations.
This card is max 30 watts and no "extra" power connections.
Since trying this on 7.0.1 I cannot even start the VM as it always fails with:
"Power on failure messages: Module 'DevicePowerOn' power on failed."
I see nothing in the logs that would explain the failure.
2020-10-22T23:13:21.121Z| vmx| I005: DICT pciPassthru0.allowedDevices = "0x10de:0x1cb3,0x10de:0xfb9"
...
2020-10-22T23:13:21.410Z| vmx| I005+ Power on failure messages: Module 'DevicePowerOn' power on failed.
2020-10-22T23:13:21.410Z| vmx| I005+ Failed to start the virtual machine.
2020-10-22T23:13:21.410Z| vmx| I005+
2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4200]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2020-10-22T23:13:21.411Z| vmx| I005: Transitioned vmx/execState/val to poweredOff
2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4200]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0
2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4238]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error
2020-10-22T23:13:21.411Z| vmx| I005: Vix: [mainDispatch.c:4200]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2020-10-22T23:13:21.411Z| vmx| I005: Transitioned vmx/execState/val to poweredOff
As soon as I remove the GPU from the VM, the VM will power up without issue.
I feel I have to be missing something obvious as this "should work", but I'm currently at a loss.
I've attached the log and vmx files.
TIA for any help.
Pulled the video card and made sure it was functional (it is)...anyone's thoughts on this would be greatly appreciated.
I'm having the exact same problem with Nvidia Grid K2 in passthrough. When removing the PCI device, the VM boots up just fine. Other then the 6.7 to 7.0 upgrade, nothing else was changed and everything work flawlessly in 6.7. I tried everything possible and worst of all, we can't rollback to 6.7. What a mess!
Hi @BMWAdriano ,
Please check if this helps --> https://kb.vmware.com/s/article/67587
No, unfortunately, it does not work!
I think in the original post it was already shown that passthrough has been enabled on the esxi host.
@jmbraben2 have you been able to find out any solution?
Thank you but the information you have provided is only basic documentation of how to initialize pass-thru. I have done this already multiple times on 6.5, 6.7 with no issue and now on 7.1 it does not function correctly. Unfortunately this information does not help and we have given up on 7.1 and reverted back to 6.7 which works without issue.
@BMWAdriano Is there any convenient way to roll back to 6.7?
@ashilkrishnan This problem seems to be quite different from similar cases you can find on the web.
In other cases, we could at least find "device blah blah does not exist".
However, we could not find any useful information in the log.
Hello VMdicker,
Unfortunately upgrading to v7 does not have a simple "rollback" as did 6.7 to earlier versions (from my understanding). I ended up reinstalling 6.7 from scratch. Luckily my datastores stayed intact so it wasn't a complete loss.
I am having the same issue with Nvidia Titan V on ESXi 7.0.2, which works fine on ESXi 6.7. There is no any other error message in the VM's vmware.log besides "Module 'DevicePowerOn' power on failed". I do find some information in the ESXi host's vmkernel.log.
PCI: 886: 0000:xx:00.0: Translation for IO 0x0 - 0x7f failed: not a bug
PCIPassthru: 1420: Failed to get pci info for 0000:xx:00.0
PCIPassthru: 1431: Disable Domain for device 0000:xx:00.0
PCIPassthru: 808: pcipdevInfo: 0xxxxxxxxxxxxx (0000:xx:00.0), state 0, destroyed
I don't know if these information are related to the problem.
Hope ESXi 7.0.3 will be released soon and solve the issue.
Has anyone found the latest ESXi to solve this issue yet? We are postponing the purchase of Horizon for this reason and until it is resolved, we will only be using a ESXi standalone installation.
Whatever this is is still outstanding ?
I frustratingly got it to work on P2200 Quadro GL107 on my server with Force Enable Host Display to Embedded in HP Bios set
But just cannot get working in the same server using GL106 and exact same settings 😞
I did however once get it working on another server DevicePowerOn but force block module NVIDIA in the Boot of the host using a linux block module
However I cannot seem to find what it was
I need the PCI device to be unknown VGA device in the ESXI host
The VGA compatible Device in PCI list works
NVIDIA VGA Device in PCI list does not work
I have a Dell Precision Tower 5810. I have configured both an NVIDIA Quadro M4000 (GM204GL) and an NVIDIA Quadro P2000 (GP106GL). I am using ESXi 7.0.1 and vCenter 7.0.3. I have passed through both of these devices using DirectIO Pass through and am able to boot VMs with either device or a VM with both GPUs.
I'm not sure what you are doing, but it seems like maybe you are missing a config step. You might want to double check the steps in the KB article mentioned above.
do you have onboard graphics? or a third graphics
I think that is the common trait as i have a machine that only has 1 graphics PCie or the Bios forces PCIe as both primary and secondary, mine does the latter with no workaround
Yes the majority of comments respond to the link directly we all get Hardware to be listed as passthru but the VM fails to boot
does your bios allow you to set Primary display as Onboard embedded? in these cases it always works for me
do you have a 3rd PCIe graphics card ? in these cases it always works for me
when embedded and primary are enabled at same time in bios doesnt work for me
when primary/all is in use by ESXI are enabled at same time doesnt work for me
I enable kernel option esxi headless with TTYS0 output only and the lone PCIe card can be passed in 7.0
It seems 7 has introduced some drivers in their new non linux model for NVIDIA that do now allow the technical sharing of the card anymore
I expect my system has onboard graphics. I'm not using these for graphics at all, but just for compute workloads. I have been able to successfully use the NVIDIA Guest (ie native Linux driver) in my VMs without issue. I am just using the emulated graphics card in my VM too (as well as passing through these devices).
the issue, we cannot pass through the compute card to the OS if attached while VM is powered off,
power on the VM halts at DevicePowerOn fail and never works until PCIe is detached from the VM.
HPe server DL360 has embedded graphics, a Bios option to disable PCIe as a System default display and vsphere passthrough works
HPe DL20 for example has no such such settings it is either both or PCIe only no means to lock PCIe for being used by OS (vSphere 7.0) from being a display.
It is reproducible on all templates of VM linux 2.6 through to debian & windows
VM hardware formats 6 6.5 6.7 and 7
i imagine there may be a manual removal of PCIe from the vShere host in passthru.map file changing from bridge to d3d0 mode or similar
Hi, I'm just commenting on my experience with the Dell Tower. If you are hitting this issue with a specific environment I would suggest filing an SR and going through the official support channels to triage your issue.
I've ignored this issue for a long time (not that I did not want resolution, but not seeing anything that helps)
This is an HPE ML30 gen 10, I'm not seeing any bios settings that would imply the PCIe card would be used by the system by default. I've looked at the card outputs and they don't seem to be active FWIW.
One comment by @Shacl0w got me looking at the vmkernel.log. It appears that every time the VM goes to start up:
2022-01-05T22:11:40.273Z cpu0:524325)PCI: 1330: Skipping device reset on 0000:0a:00.0 because PCIe link to the device is down.
2022-01-05T22:11:40.273Z cpu0:524325)WARNING: PCI: 891: 0000:0a:00.0: Translation for MEM64 0x4000000000 - 0x400fffffff failed: firmware bug
2022-01-05T22:11:40.273Z cpu0:524325)PCI: 533: \_SB_.PC00: root bridge resources (via ACPI):
2022-01-05T22:11:40.273Z cpu0:524325)PCI: 545: IO: 0x0 - 0xcf7 Translation: 0x0 vmk_IOResourceAttrs: 0x0
2022-01-05T22:11:40.273Z cpu0:524325)PCI: 545: IO: 0xd00 - 0xffff Translation: 0x0 vmk_IOResourceAttrs: 0x0
2022-01-05T22:11:40.273Z cpu0:524325)PCI: 545: Mem: 0xa0000 - 0xbffff Translation: 0x0 vmk_IOResourceAttrs: 0x0
2022-01-05T22:11:40.273Z cpu0:524325)PCI: 545: Mem: 0x80000000 - 0xfeafffff Translation: 0x0 vmk_IOResourceAttrs: 0x0
2022-01-05T22:11:40.273Z cpu0:524325)PCIPassthru: 1420: Failed to get pci info for 0000:0a:00.0
2022-01-05T22:11:40.273Z cpu0:524325)PCIPassthru: 1431: Disable Domain for device 0000:0a:00.0
Interesting it does not show up in the VM logs, but whatever...0000:0a:00.0 is the Quadro card, and given the "translation is failing" would seem to be a problem...but what to do about it?
I've moved on to 7.0u2 trying to avoid the "destroy the USB boot media" issues.
hmmm...this looks interesting...i'll try when get some time.
https://forums.servethehome.com/index.php?threads/how-can-i-passthrough-this-gpu-video.28150/