VMware Cloud Community
woodycollins
Contributor
Contributor
Jump to solution

Host Isolation Response Question

So there has been some questions recently in our company about how the host isolation response works in vCenter server 4.1.  Given the descriptions on the available options of leaving the VM's powered on or shutting the VM's down, how does HA determine that an isolated host is truely isolated and running compared to completly failed (offline)?

Can someone explain in a bit more technical detail than what the VMware kb article pages explain on how the host isolation reponse works?

From the reading of how the host isolation configurations can be set, if you set the isolation settings to "leave vm running", in the event of a total host failure (offline) do the other hosts in the cluster not attempt to bring the VM's back online on another host?  And is it recommended that you set the isolation response to "power off" so that other hosts in the cluster can bring the VM's back online?

0 Kudos
1 Solution

Accepted Solutions
weinstein5
Immortal
Immortal
Jump to solution

I still don't understand how a host can be determined as "isolated" as compared to "offline".  Isolation is simply the network communications have failed and the VM's are still happily running along on the isolated host.  A host just simply failing and going offline (physical power failure for example) is a completly different scenario.  The locks are not released properly (not will they be able to by any type of isolation response configuration) and the VM's are not running on the offline host


To the HA cluster if communication is lost to a node the cluster assumes that the node has failed and will trry to restart the VM on the rremaining nodes of the cluster - the locks are renewed constantly so if the non-responsive host is isolated rather than failed it will still be refreshing the locks on the VMDK files. and the VMs will not start - it is this feature that allows the HA to work -because with what you describe HA would never work -

In the scenario were the host goes offline and the VM's are not running and the isolation response is set to "leave vm running" how does the other hosts in the cluster determin that the host is truely down?

The other hosts will always assume the isolated host is truly down and try to restart the VMs - the isloated host is the machine that will follow the isolation response settings - either leaving the vms poered on or power them off

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

View solution in original post

0 Kudos
4 Replies
weinstein5
Immortal
Immortal
Jump to solution

You have to break into in two parts - The host that is isolated and the reamining nodes of the cluster -

The remaining nodes of the cluster assumes the host that is isolated has failed and will try to restart the VMs running on the isolated hosts because the VMs are running they will not be able to start because the files are locked -

The isolated host has lost communication with cluster after losing 15 heartbeats (one is sent every second) the isolated host will follow the isolation response settings - so if they are set to shutdown - the VMs will shutdown, the locks will be relaesed and they will restart on the remaining nodes of the cluster and of course if they remain powered on users will still continue to access with no outages-

In the event of failure HA will restart the VMs on the remaining nodes of the cluster - the isolation response will have no effect -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
woodycollins
Contributor
Contributor
Jump to solution

I still don't understand how a host can be determined as "isolated" as compared to "offline".  Isolation is simply the network communications have failed and the VM's are still happily running along on the isolated host.  A host just simply failing and going offline (physical power failure for example) is a completly different scenario.  The locks are not released properly (not will they be able to by any type of isolation response configuration) and the VM's are not running on the offline host.

In the scenario were the host goes offline and the VM's are not running and the isolation response is set to "leave vm running" how does the other hosts in the cluster determin that the host is truely down?

0 Kudos
weinstein5
Immortal
Immortal
Jump to solution

I still don't understand how a host can be determined as "isolated" as compared to "offline".  Isolation is simply the network communications have failed and the VM's are still happily running along on the isolated host.  A host just simply failing and going offline (physical power failure for example) is a completly different scenario.  The locks are not released properly (not will they be able to by any type of isolation response configuration) and the VM's are not running on the offline host


To the HA cluster if communication is lost to a node the cluster assumes that the node has failed and will trry to restart the VM on the rremaining nodes of the cluster - the locks are renewed constantly so if the non-responsive host is isolated rather than failed it will still be refreshing the locks on the VMDK files. and the VMs will not start - it is this feature that allows the HA to work -because with what you describe HA would never work -

In the scenario were the host goes offline and the VM's are not running and the isolation response is set to "leave vm running" how does the other hosts in the cluster determin that the host is truely down?

The other hosts will always assume the isolated host is truly down and try to restart the VMs - the isloated host is the machine that will follow the isolation response settings - either leaving the vms poered on or power them off

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
woodycollins
Contributor
Contributor
Jump to solution

Awesome, thanks.  I figured there was something with the lock files as I had read some were that there is a "starvation" time.  But there was never a description of exactly what that is.

0 Kudos