VMware Cloud Community
cswiatek
Contributor
Contributor

ESXi 6.5 host freezing/crashing when shutting down VM with GPU passthrough

Hello,

I've had this issue on my ESXi host since version 6. I have an AMD Radeon 6450 GPU set as a passthrough device to a specific VM. Everything works great except for when that particular VM is powered off. The ESXi host freezes/stops responding and must be hard power cycled to get it operational again. I did some research and it appears that it's a problem with the hypervisor sending a reset to the PCI-E bus not executing properly during power off. This isn't an issue when the VM is just rebooting (not full power off).

I was hoping upgrading to ESXi 6.5 would resolve this issue but alas, it hasn't.

Here are more specifics regarding my host:

* Supermicro X9SRL-F Server Motherboard

* Intel Xeon E5-2690 CPU

* 32GB DDR3 RAM (not ECC)

* 8TB of local storage connected via an LSI MegaRAID 9240-8i controller

* AMD Radeon 6450 GPU

Any guidance on this issue would be greatly appreciated. I've seen some suggestions on dropping down to ESXi 5.5 but I would prefer not doing that!

Thanks,


Chris

0 Kudos
4 Replies
goblest
Contributor
Contributor

I am in the same boat.

I have seen a lot of instability passing through additional devices.  I was hoping 6.5 would also fix the shutting down issue on the AMD; however, I believe the root of the issue is the device not having support for FLR x86 virtualization - Wikipedia‌.Therefore not being able to properly reset the card on the bus.

0 Kudos
NSMatthew
Contributor
Contributor

I have this happen with an AMD 380 GPU as well.

Today i was testing with some Dell Dual port Server grade nics, i passed through 1 card and now if i power off my pfsense firewall, ESXi freezes for about 2 mins then just reboots

It seems there is something fundamentally wrong with ESXi 6.5 and pass through of at least GPU's and NIC's

0 Kudos
NSMatthew
Contributor
Contributor

Anyone have access to

ESXi 6.5d (vSAN 6.6 Patch)ESXi650-2017040012017-04-18

5310538

to see the release notes and if any pass through issues are resolved

Found the release notes:

VMware ESXi 6.5.0d Release Notes

I did some notes going back to 6.5 about losing network connectivity on NICS being passed through and to use legacy mode...

VMware vSphere 6.5 Release Notes

Network becomes unavailable with full passthrough devices
If a native ntg3 driver is used on a passthrough Broadcom Gigabit Ethernet Adapter, the network connection will become unavailable.

Workaround:

  • Run the ntg3 driver in legacy mode:
    1. Run the esxcli system module parameters set -m ntg3 -p intrMode=0 command.
    2. Reboot the host.
  • Use the tg3 vmklinux driver as the default driver, instead of the native ntg3 driver.
0 Kudos
Memnarch
Enthusiast
Enthusiast

Hi all-- I had a very similar problem with a mixture of AMD and NVIDIA boards on a 6.5 host.  The AMD Radeon 460 vm would (almost)always crash the host when rebooting, but wasn't the only source of problems.  Removing it fixed most but not all of the problem.  Changing /etc/vmware/passthru.map entries DID fix the remainder of the problem and would probably work for the AMD board as well: system is now reliable regarding reboot/shutdown.  Search elsewhere in these forums regarding this fix. Thanks

0 Kudos