VMware Cloud Community
JaySMX
Hot Shot
Hot Shot

Help me understand Isolation Addresses and Disk Locking

The availability guide states that the isolation address is pinged after a host does not recieve heartbeats from all other hosts in the cluster for 12 seconds, and I know that you can assign multiple isolation addresses.  My confusion is, what happens when you have multiple networks, and management traffic is separate from VM and storage traffic.  If you configure an isolation address on each network, do you need a management connection on each network as well in order for ESX to be able to ping an isolation address on each?  Essentially, does each isolation address have to be routeable from a management connection on the host?  I assume so, but haven't been able to find anything about it.  Also, with multiple isolation addresses, does ESX go down the list and declare itself "not isolated" if any one of them is pingable?

Second, VMFS uses disk locking to ensure that a VM's files are not taken over and written to by another host in the event that host is network-isolated and the isolation response is set to "Leave powered on".  How does this work if you are running VMs on an NFS datastore instead of VMFS?

-Justin
0 Kudos
8 Replies
AndreTheGiant
Immortal
Immortal

Isolation address can be a local IP or also a routable IP... but is better a "local" IP.

If you have multiple interfaces, then the right one is choosed according the routing table.

PS: is if you use additional isolation addressess, remember to increase the also the HA timeout.

About NFS, I suppose that a file locking will used (but not sure at 100%).

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
JaySMX
Hot Shot
Hot Shot

That makes sense.  So, in a situation where there are multiple networks that are not connected to one another, you would need a management interface on each network so ESX can use it's internal routing to figure out how to get to the isolation address.

I am working on a design such as that even if the management network goes down (since it will be on a single physical NIC) but the VM network is still online (separate NICs), I want ESX to not declare itself isolated and not shutdown the VMs since they would still be online.

-Justin
0 Kudos
AndreTheGiant
Immortal
Immortal

You do not really need too much isolation addresses (more address more slow could be HA to take a decision).

Remember that, by default, the gateway is already an isolation address .

And all isolation address must fail to say that a host is isolated.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
depping
Leadership
Leadership

Just set the isolation response to "leave powered on". 2 isolation addresses is more than enough in my opinion anyway.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

JaySMX
Hot Shot
Hot Shot

Thanks Duncan, that seems like the most direct solution in this case afterall.  Do you happen to know how the file locking is handled on an NFS datastore?

Thanks,

Justin

-Justin
0 Kudos
depping
Leadership
Leadership

It is just the file locking mechanism that is used with NFS.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

0 Kudos
ferdis
Hot Shot
Hot Shot

It is recemmended to use Power Off Isolation Response in NFS/iSCSI in scenarios where you have one adapter for Management Traffic and also for NFS.

I was curious to test this scenario which as I know is not recommended.

I was testing scenario where I got only one vmk0 for both Management Traffic and NFS. VMware HA setting for Isolation Response was Leave Power On so exactly that which I suppose is not recommended :)(. I started VM located on NFS on ESX1. Then I disabled vmk0 on ESX1 and found that VM was restarted on ESX2 but also stayed Powered On on disconnected host ESX1. After I enabled vmk0 on ESX1 back VMware HA Reconfigured agent and I found this message in Events:

Issue detected for : Disk lock was lost for virtual machine running on host esx1.xxx.xx
Auto-answering question to allow duplicate virtual machine to power off.
warning
x/xx/20xx 5:13:17 PM

So this means that HA solve this automatically by Powering Off original VM on reconnected host. HA does this from version 4.0 Update 2 as Duncan says on his site: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

"

As of version 4.0 Update 2 ESX(i) detects that the lock on the VMDK  has been lost and issues a question which is automatically answered. The  VM will be powered off to recover from the split-brain scenario and to  avoid the ping-pong effect. The following screenshot shows the event  that HA will generate for this auto-answer mechanism which is viewable  within vCenter.

"

But I found interesting thing. After VM was restarted on another host I Powered VM Off. Then I reconnected original host back to HA. After that I wanted to start VM and it showed me this:

Clipboard01.jpg

I suppose I should choose I moved it or?

And maybe this could still be the reason why not to use Leave Powered On or?

0 Kudos
depping
Leadership
Leadership

Depending on the version it will power off itself. If you leave it set to "leave powered on" a split brain scenario could occur in the case of iSCSI/NFS and the screenshot shown would be the result. ESX will auto answer it in the latest version, but in my opinion I would prefer to prevent an issue.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

0 Kudos