Hey all, I was off yesterday and when I came in this morning I was unable to access my vCenter web client. I logged in to the ESXi 6.0 server that was hosting vCenter and saw that all four VMs on this host were powered off. They powered on just fine and when I was able to access vCenter I checked for events and saw that yesterday around 4:45pm something happened (I’m still trying to find out what happened).
The first thing I did was look for the reason HA failed. I see this but I couldn’t find anything saying why it wasn’t able to move the VMs:
vSphere HA agent on host 192.168.2.120 connected to the vSphere HA master on host 192.168.2.121 Information 1/2/2017 4:46:04 PM 192.168.2.120
Once my VMs were running again this was sent to my vCenter:
Alarm 'Host error' on 192.168.2.120 triggered by event 55038 'Error detected on 192.168.2.120 in DatacenterOne: Agent can't send heartbeats: Host is down' Error 1/3/2017 8:16:19 AM 192.168.2.120
Ok, so vCenter knew that there was a problem and tried to help but HA never moved these VMs to another host. I'm still wading through the logs. Can anyone tell me where to look in order to find out the specifics on why HA didn’t help me out here?
Thanks,
Joe B
Hi Joe,
Check out the fdm.log files on the ESXi hosts, especially the HA master if you knew which host that was at the time.
Cheers, Matt.
Hi Matt, I checked the FDM logs on all three of my hosts and I didn't see anything in any of them at the time the meltdown started.
Thanks,
Joe B