VMware Cloud Community
Marcel1967
Enthusiast
Enthusiast

Killing hostd process on master initiates a vSphere HA failover

One of my clients has an issue with hostd on ESXi 5.1 servers running Update3. Sometimes hostd crashes with error "Memory exceeds hard limit. Panic"   VMware is working on a fix.

However the side effect of hostd crashing is that vSphere HA tries do initiate a HA failover. It does not succeed in the HA failover because the host is not isolated. Both the management network and datastores heartbeats are alive. Also the VMDK's of the VM's running on the host with stopped hostd are still locked by the host.

The customer is able to reproduce the issue. When he kills hostd on the host running the HA master role he gets warnings:

vSphere HA initiated a virtual machine failover action in cluster <name> in datacenter <name> " folowed by

"vSphere HA failover in progress".

After a while the follow event appears: "vSphere HA unsuccessfully failed over <vm name> on <edxi host> in cluster <name> in <datacenter>. vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: The operation is not allowed in the current state."


When hostd is killed on a slave the error below is shown on the master FDM logfile

DeadSlaveCnx: Dead connection

and

Reporting Slave host-27 as FDMUnreachable


I understand the HA agent FDM 'talks' to hostd.

But I do not understand why vSphere HA events pop up when hostd is stopped/killed.


The customer wants to understand why vSphere HA initiates a HA failover. As the host on which hostd is stopped/killed is not isolated or partitioned there is no risk of virtual machines being started on a different host.

Still we need to understand this.


thanks!






0 Kudos
0 Replies