VMware Cloud Community
bigusdadius
Contributor
Contributor
Jump to solution

Isolation Response?

This has probably been covered already, but I thought I would ask anyway since I couldn't find much on it.

If you lose network connectivity to all ESX servers in the HA cluster, will they individually declare themselves isolated and power down all VMs?

0 Kudos
1 Solution

Accepted Solutions
conyards
Expert
Expert
Jump to solution

From the white paper;

• Isolation response determines what a host in a HA cluster should do with running virtual

machines when the host loses its network connectivity (not receiving HA heartbeats and

unable to ping gateway). By default, virtual machines are powered off in case of a host

isolation incident. This releases their shared storage locks, which allows the virtual

machines to be started up on other hosts.

You can change this default behavior for individual virtual machines and choose Leave

running to indicate the virtual machine on isolated hosts should continue running even if

the host can no longer communicate with other hosts in the cluster. If you choose to do

this and it turns out that the original host can't access shared storage, the virtual machine

lock will time out and the virtual machine may be started on a second host (a condition

commonly referred to as split-brain).[/b] This condition is more likely to occur with NAS or iSCSI

storage, in the case of network failures, since both methods are TCP/IP based. For these

types of storage, keeping the Isolation Response at Power off (the default) is highly

recommended.[/i]

So it would appear ESX C will only restart the VM if the lock isn't present through either a time out or VM power down. Therfore meaning if a lock was present on a VM then the HA process will not restart it on an alternate host.

Simon

https://virtual-simon.co.uk/

View solution in original post

0 Kudos
31 Replies
conyards
Expert
Expert
Jump to solution

Not quite.

If HA AAM running on each of the hosts loses connectivity with the rest of the cluster then it will run the appropriate isolation response and VMs will be started on appropriate hosts within the cluster. Up until the cluster reaches the number of host failures allowed, at which point a host isolated in this circumstance will perform the isolation instructions, but no VMs will be restarted.

Hence not quite, the behavior isn't dependant on all servers in a cluster being isolated, rather the cluster reaching its number of allowed host failures level. (Which I should probably point out is four, regardless of cluster size).

Simon

https://virtual-simon.co.uk/
Jakobwill
Enthusiast
Enthusiast
Jump to solution

If you got a Isolation IP defined and they still can access it, they will declare each other as gone.

If you have no Isolation, they will power off the VMs.

You can define not to power off a VM if the machine goes to isolation IP.

0 Kudos
bertdb
Virtuoso
Virtuoso
Jump to solution

note: the default setting if there is no specific das.isolationaddress is the service console default gateway.

0 Kudos
bertdb
Virtuoso
Virtuoso
Jump to solution

bigusdadius,

isolation is a state on every ESX server individually. If \_it_ doesn't see any other ESX server over the network anymore, and doesn't see its default gateway (or isolationaddress), it will declare itself in isolation, and will perform the isolation response. That might mean powering off the VMs that are running locally.

So it is possible that every ESX server goes into isolation, yes. But the requirements for that to happen are not central. If \_you_ (being an external party, maybe virtualcenter ?) stop seeing ESXes, that doesn't mean they don't see each other anymore.

(when I say "ESX server" here, it's really the service console on each ESX server we're talking about, because that's where the HA agent runs)

0 Kudos
bigusdadius
Contributor
Contributor
Jump to solution

I'm trying to find out that if the network goes down do all of the servers go with it?

0 Kudos
conyards
Expert
Expert
Jump to solution

if none of the ESX hosts can talk to each other or the das.isolationaddress and your hosts are configured with an isolation response of power down VMs; then yes if you lose the network all VMs will be powered down.

https://virtual-simon.co.uk/
0 Kudos
bigusdadius
Contributor
Contributor
Jump to solution

That is fine and what I expect...I have a customer that was worried due to their past issues with their network. He wondered if setting the isolation IP to the loopback adapter address would keep the VMs powered on in the event of a total network failure without setting the isolation response in the VI client.

I explained that they need to go down to be restarted on the other hosts and that if he lost the entire network (again) he would have larger problems...

Thanks to all who replied.

0 Kudos
VirtualKenneth
Virtuoso
Virtuoso
Jump to solution

The isolation response is rather clear. Each ESX (SC network) could become individually isolated and the isolation response will decide what happens for each individually server.

But how does it see the difference between an isolation and a ESX host power down?

If ESX-A and ESX-B get isolated and ESX-C is still running "what" decides if ESX-C should power on the VM's from ESX-A and ESX-B?

ESX-C has no communication to ESX-A and ESX-B and therefor cannot determine if the others are isolated or down, right?

0 Kudos
bigusdadius
Contributor
Contributor
Jump to solution

Good question.

It was my understanding that hosts that are still connected to the network and cannot communicate to their peers could power on VMs from the down hosts...that is if there is no lock on the VMDK file...

0 Kudos
VirtualKenneth
Virtuoso
Virtuoso
Jump to solution

Well that was my understanding as well... ESX hosts that are not isolated always try to start the VM's and in case no VMDK locking is engaged it will succeed.

Can someone confirm this thinking? or are you wrong in thinking.

0 Kudos
conyards
Expert
Expert
Jump to solution

in a case where ESX A and B are powered down, ESX C will know that it isn't isolated if the das.isolationaddress has been configured (advanced options of the HA cluster) HA will check both its peers and this device.

My understanding is that configuring this setting to a router (or any other device) that is up on the network 24/7, will help HA decide what hosts are isolated and which are not. Even more important to have configured this in a two node HA cluster to prevent HA split brain.

https://virtual-simon.co.uk/
0 Kudos
VirtualKenneth
Virtuoso
Virtuoso
Jump to solution

But what if ESX A and B get isolated and the isolation response is "keep online" Will ESX C try to start the VMs (and can't do since they are locked)?

0 Kudos
conyards
Expert
Expert
Jump to solution

From the white paper;

• Isolation response determines what a host in a HA cluster should do with running virtual

machines when the host loses its network connectivity (not receiving HA heartbeats and

unable to ping gateway). By default, virtual machines are powered off in case of a host

isolation incident. This releases their shared storage locks, which allows the virtual

machines to be started up on other hosts.

You can change this default behavior for individual virtual machines and choose Leave

running to indicate the virtual machine on isolated hosts should continue running even if

the host can no longer communicate with other hosts in the cluster. If you choose to do

this and it turns out that the original host can't access shared storage, the virtual machine

lock will time out and the virtual machine may be started on a second host (a condition

commonly referred to as split-brain).[/b] This condition is more likely to occur with NAS or iSCSI

storage, in the case of network failures, since both methods are TCP/IP based. For these

types of storage, keeping the Isolation Response at Power off (the default) is highly

recommended.[/i]

So it would appear ESX C will only restart the VM if the lock isn't present through either a time out or VM power down. Therfore meaning if a lock was present on a VM then the HA process will not restart it on an alternate host.

Simon

https://virtual-simon.co.uk/
0 Kudos
kryichek
Enthusiast
Enthusiast
Jump to solution

So how do you make the default isolation response to "Leave Powered On" ?

Charles Mielak VCP, vExpert
0 Kudos
conyards
Expert
Expert
Jump to solution

the default isolation response cannot be changed, however idividual machines can be configured with a different isolation response from the settings menu of the cluster.

Right mouse click the cluster

select edit settings

highlight virtual machine options

isolation response settings is changable from a drop down next to the VM name.

I would have a close think about why you would want to leave VMs powered on, on a host that is potentially failing.

https://virtual-simon.co.uk/
0 Kudos
kryichek
Enthusiast
Enthusiast
Jump to solution

I would have a close think about why you would want

to leave VMs powered on, on a host that is

potentially failing.

So everytime I build a VM I will need to address the Isolation Response.

This issue came up when the Networking Group decided to do Core Upgrades. When the Core was down the VM's shut themselves down and when the Core came up all the VM's had to be manually restarted. In this case I would rather have them just stay on and wait for the network to return.

Charles Mielak VCP, vExpert
0 Kudos
conyards
Expert
Expert
Jump to solution

The problem isn't with ESX, sounds as if it's the way the core switch upgrade was managed....

https://virtual-simon.co.uk/
0 Kudos
stuten
Enthusiast
Enthusiast
Jump to solution

I had a similar situation. I was unaware of anything occuring on the core network switches and when the upgrade was performed it caused our trunked connections to go through spanning tree and caused 3 of our ESX hosts to power down their VMS.

I've spent a good bit of timing thinking about when I would want ESX to power off my VMs and have almost come to the conclusion that I would never (almost never) want ESX to "pull the plug" on my VMs. In the case of a true hardware failure and the server crashed (therefore the VMs crashed, no way around that) then the VMs will be powered back up on a surviving node (no locks on the vm files). Any other time, like a network outage, I'd rather not take the chance of crashing my VMs with a power down. I'd rather either work with the network team to fix the network problem or get on the service console of the isolated VMs and suspend them from there. If VMWare gave me the ability to do a shutdown instead of a power of in case of isolation then I'd say "go for it". Since they don't give me that option, I've about convinced myself that I don't want the software to determine when to pull the power plug -- I'll have to leave that decision to the humans managing the environment.

The one exception to that so far is for my convenience of managing the environment I'd like the VC server to come back up (and hope the power off didn't corrupt anything).

I'd like to see VMWare give more options for isolation response, and give you the ability to change the default -- so as to not have to worry about it when you deploy a new VM. Being as this is basically Rev 1 for these features I can only assume that we will see more in these areas in future releases.

0 Kudos
jdaunt
Enthusiast
Enthusiast
Jump to solution

Not necessarily the case in more of a test lab environment. With that being said, the process for changing the isolation response on all the virtual machines is very cumbersome. In a fiber SAN environment, it makes the most sense to me to change the isolation response to "Leave Powered On"

0 Kudos