VMware Cloud Community
kuhnto1
Contributor
Contributor

ESXi 8 PSOD when powering on VM with PCI passthru

We are trying to upgrade from ESXi 7 to ESXi 8. After we upgrade using the zip method, when we power on a VM that has a PCI board passthru, we get a PSOD. Before the upgrade, we do not get a PSOD. The board being passed thru is an I/O card that worked with ESXi 7 without issue. It's not a GPU.

The PSOD shows issues with Adjusting the IOMMU Mappings. See attached.

Does anyone have ideas on how to resolve this in ESXi 8?

Labels (3)
0 Kudos
7 Replies
testeng1
Contributor
Contributor

I have the same issue, my PCI device passed thru on v6.7 but after v8 upgrade I get PSOD when attempting to start the VM.

0 Kudos
NateNateNAte
Hot Shot
Hot Shot

So first, I'd check if there was any documentation on the VMware compatibility site to see if the card in question was listed at all (maybe/maybe not, but it's worth a check).  Based on your description, it looks like the I/O card may no longer be supported.  The other thing to check would be the HW compatibility for the host that PCI I/O card is sitting in.  Third, I'd check for an updated driver for the I/O card, AND see if there is an updated VMTools ISO that may have the correct mappings. 

If all else fails, at that point I'd open a case with VMware (have the compatibility findings in hand) and then keep plying the community for answers.  I have not run into this myself, but that is how I would approach this problem.  

And super-last, if it is only affecting particular VMs - then you need to re-assess if those VMs need to be on that host (I'm assuming yes) and if there is a path to upgrade those VMs to another host to meet whatever requirements are driving that PCI passthrough.

0 Kudos
ObiWoRen
Contributor
Contributor

This issue is also impacting me. I have several PCI cards in an ESXi host that we use for hardware regression testing. I am unable to boot any VMs with a 'PCI device' attached. (The VMs all boot fine if I detach the PCI device(s).)

The stack trace in the PSoD is nearly identical to the image that you posted. I have verified that 'I/O MMU' is disabled in the CPU configuration items for the VMs in question. (In fact, they fail to boot without causing a PSoD if 'I/O MMU' is enabled.

For now, I guess we will have to fall back on ESXi 7u3.

[VTDDomainFlushIotlbInt @ vmkernel, VTDDomainFlushIotlb @ vmkernel, PCIPassthruMapFPTPages @ pciPassthru, PCIPassthruAdjustIOMMUMappings @ pciPassthru, VMKPCIPassthru_AdjustIOMMUMappings @ vmkernel]

0 Kudos
bluefirestorm
Champion
Champion

Considering the PSOD call stack has function called AdjustIOMMUMappings, you could try adjusting the MMIO address space of the VM.

For a VM with virtual BIOS, you try to make the PCI hole bigger by adding

pciHole.start = "2048"

For a VM with virtual UEFI, you could ensure MMIO is above the 4GB address space, if it not already there, add/edit

pciPassthru.use64bitMMIO = "True"

The 64-bit MMIO default size is 32GB, and has to be in powers of 2.

pciPassthru.64bitMMIOSizeGB = "64"

but default 32GB is likely more than enough unless the passthrough device requires something larger than 32GB (such as GPU with 48GB VRAM).

The use64bitMMIO also require that the host machine has "Above 4GB decoding" or whatever equivalent UEFI has.

ObiWoRen
Contributor
Contributor

Thank you for the suggestions @bluefirestorm; unfortunately, adding the pciPassthru.use64bitMMIO and pciPassthru.64bitMMIOSizeGB parameters did not change the sub-optimal behavior in any way. As soon as the VM with the PCI device attached to it was booted, a PSoD resulted with same stack as before.

I found this post, which seems like a similar issue and only occurs on ESXi8. Unfortunately, it does not offer a resolution.

Also, to provide a little more detail about my configuration, the PCI cards being tested are not GPUs and do not require large amounts of memory to function. We have been successfully using this hardware / software configuration through the ESXi 6.7 and 7.0 lines. This appears to be a significant regression in ESXi 8.0.

0 Kudos
bluefirestorm
Champion
Champion

You can also try the ESXi boot option (pressing Shift-O) and see whether it goes back to ESXi 7.x behaviour.

iommuMapReservedMem=1

There is a new and different default value for ESXi 8.x; 3 = Auto, ESXi kernel makes the decision
With ESXi 7.x, the default was 1 = Map all reserved memory

See these
https://github.com/lamw/esxi-advanced-and-kernel-settings/blob/master/esxi-70u3p-kernel-settings.md
https://github.com/lamw/esxi-advanced-and-kernel-settings/blob/master/esxi-80-kernel-settings.md

ObiWoRen
Contributor
Contributor

Thanks again for the suggestion @bluefirestorm; unfortunately, setting that kernel parameter back to '1' (Map all reserved memory) did not change the behavior. PSoD occurs as soon as a VM with a PCI passthru device attached is booted.

Just to confirm, I performed the following steps ...

1) Set the target setting: esxcli system settings kernel set -s iommuMapReservedMem -v 1
2) Rebooted the ESXi host
3) Verified the setting: esxcli system settings kernel list -o iommuMapReservedMem
4) Booted a VM with a PCI passthru device attached

0 Kudos