We've had some ESXi hosts PSOD a few times. Each time, vSphere HA moves the VMs off the crashed host quickly enough that our monitoring doesn't catch it.
How can I tell which VMs moved? We have logInsight installed and I can see where the cluster re-elects itself, but the rest of the log entries are indecipherable.
Searching through thousands of VMs to see which ones rebooted would be a painful process. Is there a better way?
Thanks -w
Hi there,
There are a few ways to grab this data, but I find PowerCLI to be easiest.
Something like the following would do the trick. Change the $HAVMrestartold value to the amount of days back you want to search.
$Date = Get-Date
$HAVMrestartold =5
Get-VIEvent -maxsamples 100000 -Start ($Date).AddDays(-$HAVMrestartold) -type warning | Where {$_.FullFormattedMessage -match "restarted"} | select CreatedTime,FullFormattedMessage |sort CreatedTime -Descending
The above snippet is from the following page:
http://www.jonathanmedd.net/2012/03/which-vms-restarted-after-a-vsphere-ha-event.html
Cheers, Matt.
Thanks to all who replied. Both replies are correct, and I was able to use the info to use a third method, searching on Loginsight. Luckily we have Loginsight collecting logs from vcenter and all our esxi's. The search string to use is "vSphere HA restarted". This message comes from vcenter for each vm that gets migrated but it's a needle in a haystack when intermixed with all the meaningless babble (to me) from several dozen fdm daemons.
When a blade hangs or PSODs, it takes 30 sec to a minute for the cluster to recognoze that, and then the VMs get restarted within a minute or two thereafter. The VMs do reboot when HA moves them. Our apps do not notice this, YMMV.
BTW If you don't have Loginsight, install it now if you're entitled. I don't know the pricing (it came with our vCloud Suite) but it's basically the same as Splunk or Greylog, its trivial to install and setup, and performs well on a small VM. As my coworkers say, "it's like Splunk, except it's fast!"