I am not really looking for an exact answer, but more so peoples opinions.
I woke up this morning and discovered that all the VM on my ESXi host were powered down. I checked the syslog messages and noticed that the ESXi host booted up during the night. Nothing unusual is in the log before the system boots, no indication that it had shut down prior.
My first instinct was to suspect power failure. I am leasing this dedicated server from a hosting company, I submitted a ticket to them to see if there was any power issues at the facility. They say that nothing out of the ordinary happened and specificaly not during the time when my system rebooted.
Also, network monitoring graphs show the physical NIC was in a down state briefly at the time of the reboot.
How likely is it that this is an ESXi issue, or a hardware issue. I don't have much experience with VMware in a production setting, how stable is it? is a random hypervisor crash expected from time to time, maybe a couple times a year (my server has been up for 2 months and this has only happened once). Or should I be ripping out hardware every time the hypervisor crashes. It's not like I have made and changes to the hypervisor other than creating a couple resource pools and VMs.
Is this server a stand alone box or in a cluster?
It does sound like you had a power failure. Did you have any trouble powering up the VMs after the power outage? Did you have to reregister them on the host or were they all still listed?
I've had ESXi servers that ran for months with the only reboot performed when I need to do an update.
The physical box is an independant server in a rack with a bunch of other peoples, or so they tell me.
No trouble powering on the VMs.
because you hosting it is har to say where is the failure.
So far i use the esxi on my company for over 2 years we never had a problem or occure that the server is restart by themself.
it happens once on my testing/development server cluster this also because on development cluster we not protect the electricity at the frst time.
so can not suspect in esxi or hardware failure.
maybe on electricity problem that you facing but it is also hard to say except you had any data for that time, like log for electricity, cooling, hardware that monitor by ems or something.
usually provider had this monitoring device you can ask them if available.
I am leaning towards hardware failure ( if I beleive my service provider when they say there were no power issues ).
The only option they are offering me is to take the box down and do a 2 hour hardware diagnostic. If it crashes again I am going to take them up on that. I had them put some more memory in earlier in the week, I wouldn't be suprized if this was memory related.
Mysterious reboots are extremely rare, and are almost 100% of the time related to powerfailure or bad HW.
When ESX has a problem it halts and displays a Pink Screen of Death with more info - not just reboot.
So if ESXi hypervisor has some kind of driver or memory crash it doesn't reboot it just halts?
Correct.