Over the past year, my VM box will crash at random times, sometimes months apart. I was on 5.5 and now am running 6.0. The NUC is a 5i7 with 32GB RAM and the datastore is on a Synology NAS.
When this crashes, for some reason I am unable to ping anything on my network. As soon as the NUC is restarted, everything is fine. It's strange, but I am more interested in tracking down why ESXi is crashing in the first place.
The PSOD tells me Machine Check Exception: Fatal MCE on PCPU3 in world 1388527:vmm0:Visual_
System has encountered a hardware error......
I have a VM named "Visual Studio 2012 - 32 bit" and I am only guessing that error is referring to that VM.
I've been playing with this for a while, but green when it comes to troubleshooting it. Any help appreciated. Thanks.
Would it be a good assumption that because my network becomes unusable when this happens, that this hardware fault is occurring in the network adapter?
Did you make a custom ESXi installer with the correct drivers for your NUC?
I run a lab on 5.5 but I made a custom ISO with drivers that will fit the NUC as it was very unstable with the "standard" ISO installer.
A good place to start is: http://tekhead.it/2013/01/nanolab-running-vmware-vsphere-on-intel-nuc-part-2-2/
this addresses the how to build a custom ESXi ISO
I forget what site I found the info for ESXi/NUC, but I did use "VMware-VMvisor-Installer-6.0.0-2494585.x86_64.iso" and "ESXi-Customizer-v2.7.2". I don't remember doing anything with the drivers though. I'll check out that page.
I had another crash yesterday morning and once again, it takes down my entire wired network. (can't ping other devices)
In fact, here's the bundle I used:
sata-xahci-1.28-1.x86_64.vib
According to virten.net (ESXi 6.0 Image for Intel NUC | Virten.net), the drivers are already included.
I am using VMXNET 3 for all my VMs. Should I be using E1000 instead?
had similar sympthons both with ESX 6.0 and ESX 6.5: got hardware error messages randomly for: disk/ local bootable pendrive + hba/iscsi also, especially when I had strong I/O on the server.
The root cause was faulty memory modules in the motherboard, replaced the memory and now no problems.