Most likely hardware. Ryzen are known to cause problems with ESXi. Not only to mention ESXi is not designed for not compatible with most consumer hardware. It's for enterprise class gear.
Do you have any experience with other VM Hypervisors that would be more compatible with the hardware? I saw Proxmox mentioned on reddit under a related thread so that's my next go to but I am welcome to any suggestions from people more familiar with VMs.
VMware Workstation as a type-2 hypervisors. Designed for desktop hardware instead.
I've used VMware Workstation but don't think it would suit at least the intentions of my build. The idea at least was to get 2 gaming VMs for the GF and myself running on one machine by passing through the two 1080s I have to each VM and running headless on the hypervisor. Then the rest of the cores/ram would go to various VMs as a homelab. I suppose I should have stated that before in my question. Sorry!
Hi Sad Penguin,
The official line is that we don't support ESXi on consumer hardware, but we certainly don't want to ignore gratuitous compatibility issues either. In case you're willing to put some time into troubleshooting this and possibly getting ESXi running on your host....
I have figured out that ESXi is failing while initializing MSI-X (Message Signaled Interrupts) on a PCI or PCIe device. Unfortunately, I can not tell which device based just upon the information you've provided, nor have I yet figured out just how ESXi is getting so confused by the device's MSI-X support that it fails entirely.
With your help, we might be able to narrow it down by temporarily removing PCI/PCIe devices. If you have the opportunity to try with as few as possible add-in PCI/PCIe devices – and perhaps any unnecessary on-board devices disabled in BIOS/firmware setup – that might help to narrow things down. If you have another video card to swap in just for testing, that could possibly also provide useful troubleshooting information. (Even if you disable or remove a device which will be needed in the final build, it could be a useful test if it helps identify the device which is giving ESXi a headache.)
Just want to say I really appreciate you taking the time to assist me! So first thing this morning I tried narrowing down what PCI/PCIe device could be causing the issue by doing a number of things, and attempting to replicate the issue after every change, here are the results:
1. Removed the M.2 drive I realized I forgot to mention, same PSOD, slight change in readout.
2. Moved the original 1080 (I have 2 for this build but only using one at a time during setup to limit points of failure) down to the second PCIe slot. Same PSOD, slight change in readout.
3. Installed Used 1080 into first slot, same PSOD, slight change in readout.
4. Swapped used 1080 to second slot, same PSOD, slight change in readout.
5. Installed GTX 980 into first slot, same PSOD, slight change in readout.
6. Swapped GTX980 1080 to second slot, same PSOD, slight change in readout.
Imgur Gallery for each PSOD in order:
Really confused by this. Seems to be the same error, regardless of card or slot. These are all Asus variants of Nvidia cards so maybe that is the common factor? Worth noting that I still have a bootable Windows drive that was disconnected for this troubleshooting but just in case, I plugged it in and removed the ESXi usb drive and windows still boots fine on this hardware. Wanted to make sure there wasn't any memory or CPU shenanigans.
P.S. Hope the imgur gallery is okay on these forums, seemed the easiest way to add multiple images.
Looks like the Realtek NIC (Realtek RTL8117) on the motherboard was causing the issue. Once disabled in bios, I am able to get to a screen which states "No Network Adapters" however there is a Intel I211-AT Gigabit LAN port on the motherboard as well so I most likely just need to add the drivers to the install image and I'll be able to move forward from here.
If you would like to volunteer to go the extra mile and help us figure out why ESXi is faceplanting on your RTL8117, it'd be awesome if you could boot a Linux LiveCD/LiveUSB or similar (with the Ethernet controller re-enabled in BIOS) and run the following:
lspci -xxxx -d 10ec:
(using "sudo" or "su" if needed).
It should emit a big gob of hexadecimal stuff... and somewhere in that will be the part that's confusing ESXi. If you can save that output somewhere and post it back here, that would be utterly awesome. With that info, I can file a bug report or maybe just fix it directly... It might not immediately get your RTL8117 working with ESXi, but it will at least allow us to improve our compatibility and perhaps make it possible to enable it if/when a NIC driver is available.
No worries if you just want to go ahead with using your system without all this extra fuss, now that you've got it up and running.
(BTW, just found another thread with the exact same failure: ESXi 6.7 U3, AMD 3700X, X570 Motherboard, PSOD when booting. You are not alone.)
I'll hit that up today and get back to you.
I also have another question if you'd be so kind. So my board has 4x USB 2.0 (on board) 2x USB 3.2 Gen 1 and 5x USB 3.2 Gen 2 ports (4xA and 1xC) on the rear IO. Sadly here is all I am seeing in ESXi:
So it appears that only the USB 3.2 Gen 1 Ports/Microcontrollers are being recognized.
This is particularly problematic since from my understanding, as far as passthrough goes I can only pass through an entire bus and not individual ports (correct me if I'm wrong though, very much learning here). Assuming I'm correct though, I cannot pass through this bus since the boot USB for the server is also on it. So more or less the question is, is there a solution to this or is this another symptom of the limited commercial hardware compatibility of ESXi? Barring some type of solution for the USB passthrough (so I can give the VMs Keyboard and mouse at least), I'm looking at probably starting from scratch with the VMs on a different platform if I can't figure a way to give the VMs some USB control.
VMs have MKS regardless of your physical hardware - you use tools such as the console function of the vSphere Client or the VMware Remote Console (from a remote system rather than the physical host) to interact with the MKS after VM power on.
Here is the output of the command you asked for (after re-enabling the Realtek NIC in BIOS):
mint@mint:~$ lspci -xxxxx -d 10ec:
05:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
00: ec 10 6e 81 07 00 10 00 1a 00 00 ff 10 00 80 00
10: 01 cc 00 00 00 00 00 00 04 50 61 f7 00 00 00 00
20: 04 c0 60 f7 00 00 00 00 00 00 00 00 ec 10 68 81
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
05:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 1a)
00: ec 10 68 81 03 00 10 00 1a 00 00 02 10 00 80 00
10: 01 c8 00 00 00 00 00 00 04 40 61 f7 00 00 00 00
20: 04 80 60 f7 00 00 00 00 00 00 00 00 43 10 83 87
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
05:00.2 Serial controller: Realtek Semiconductor Co., Ltd. Device 816a (rev 1a)
00: ec 10 6a 81 03 00 10 00 1a 02 00 07 10 00 80 00
10: 01 c4 00 00 00 00 00 00 04 30 61 f7 00 00 00 00
20: 04 40 60 f7 00 00 00 00 00 00 00 00 ec 10 68 81
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 02 00 00
05:00.4 USB controller: Realtek Semiconductor Co., Ltd. Device 816d (rev 1a)
00: ec 10 6d 81 06 00 10 00 1a 20 03 0c 10 00 80 00
10: 04 20 61 f7 00 00 00 00 04 00 60 f7 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ec 10 68 81
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 04 00 00
05:00.7 IPMI Interface: Realtek Semiconductor Co., Ltd. Device 816c (rev 1a)
00: ec 10 6c 81 07 00 10 00 1a 01 07 0c 10 00 80 00
10: 01 c0 00 00 00 00 00 00 04 10 61 f7 00 00 00 00
20: 04 00 61 f7 00 00 00 00 00 00 00 00 ec 10 68 81
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 04 00 00
Please let me know if you need anything else!
Forgive me if I am misunderstanding, but I believe you are suggesting using another standalone computer to remote into the VMs then use the MKS of said computer to interact with the VMs. I understand that this is possible, as this is how I have installed and configured the two Windows 10 VMs I have running at the moment (using my laptop). However the intention of this machine is to provide more direct access by passing through the VM video outputs to monitors locally (which I have already achieved using GPU passthrough) and then using USB passthrough with locally connected devices to make the server (or at least these two VMs on the server) into 2 local PCs hosted by the hypervisor. The extra cores, drives, and ram will be utilized for various other Linux/Unix projects but the two Windows 10 VMs will be locally attached to physical hardware, not remoted into by other fully functional standalone hardware. Sorry if that was not clear. And I do understand that this is really not a normal use case for ESXi. This is meant as part Homelab for me and part dual-gaming machine for my spouse and I. I appreciate the feedback though!
Interesting use-case... a multi-head "hydra" PC.
For passing through keyboard/mouse to a guest running on ESXi, you can either use PCI passthrough to pass through an entire USB controller at the PCI level (which, as you've described, will be problematic if your ESXi is booted from USB, but would allow your guest to actually "see" the exact host USB controller), or you can configure individual USB devices for passthrough at the USB level (in which case the guest will "see" the exact host USB device, but attached to our standard virtual USB controller which is entirely emulated)..
I don't know the details of configuring USB passthrough on ESXi, but you might find that passing through USB HIDs (keyboard and mouse devices) in particular requires some special configuration. VMware Workstation and Fusion at least demand that the "usb.allowHID=TRUE" option is added to their config before allowing it... This is to stop the user from locking their keyboard/mouse inside the VM and not being able to escape back to the host.
Thanks! Unfortunately, we only got 64 bytes of output for each device... lspci succeeds but quietly produces limited output if root privileges are absent, which is probably what happened here. Could you run:
sudo lspci -xxxx -s05:00.1
and that should dump the entire PCI configuration space of the Realtek NIC... should be somewhat more than 64 bytes.