VMware Cloud Community
TronAr
Commander
Commander

Isolation revealed :)

Hi,

just wanted to share something that might help someone get a hold of HA intricacies...

I've always liked to analyze and think about distributed algos, and HA is a nice piece.

Cluster in general have this dilemma about staying alive or dying when a partition occurs. Being in two separate worlds,

it's impossible to decide if the other side is dead or alive... so vSphere 5 way of handling it is interesting!

Being able to have more than one master seems to be a good option, given that you know that management view can be wrong for as long as your partition holds.

That's better than having the VMs down.

But I have had a hard time understanding isolation. I was under the (wrong) impression that your isolation response was tied

to how the solution would behave at both ends, i.e., at the slave being isolated and at the master.

(It doesn't really matter if the master was isolated, after being so, another master will be brought up)

My idea was that if isolation response was bring down, then the master would bring back up.

And converselly, if isolation response was keep up,  then the master would not mess with this VM.

That is not so. Isolation only modifies the isolated host response. The (new or acting) master will try to restart the protected VM no matter what.

And the VM lock will protect the VM from running at both places. This is even so if the isolated has means to tell the master that it is indeed isolated

and "responsible" for the VM. I had not expected that...

-Carlos

P.S.

Another nifty thing I found why playing with this:

Even if you don't set a vmnic as "management" it will answer management traffic. In fact, you can manage an isolated host with vSphere client

provided you sit in the same segment of the alternative NIC. Cool.

I was able to have an NFS based datastore alive (second NIC) and an iSCSI datastore that went down (first NIC) on a workstation based VDC.

15 Replies
TronAr
Commander
Commander

Hmm, in trying to fix this thing in my head, rereading vSphere 5 clustering deep dive, it says otherwise. But I have a scenario where it behaves like said above. Something else matters ?

I know for sure current master knows a host is isolated (it's told VC about that, among other things) and

it can read from one datastore that the VM is "running" at the isolated host.

Nihil obstat, the master decided to restart said machine, and it could because the lock on the machine was lost.

(The isolated host is still running it too)

Oh well, I don't know.

The working datastore heartbeat state is as follows:

# ls -l

total 27

-rw-r--r-- 1 vmware vmware  8 Mar 15 16:40 host-105-hb

-rw-r--r-- 1 vmware vmware 89 Mar 15 09:34 host-105-poweron

-rw-r--r-- 1 vmware vmware  8 Mar 15 16:40 host-26-hb

-rw-r--r-- 1 vmware vmware 88 Mar 15 09:33 host-26-poweron

-rw-r--r-- 1 vmware vmware  8 Mar 15 16:40 host-9-hb

-rw-r--r-- 1 vmware vmware 82 Mar 15 09:33 host-9-poweron

Host-26 is the one isolated. There are 2 VMs running, A vCenter on host-9 and the test split brained now on host-26 and host-105.

# cat host-26-poweron

71869313

1

1

71 /vmfs/volumes/53179f45-0812ca33-32ce-000c29b9568e/vm1-w2k3/vm1-w2k3.vmx

# cat host-105-poweron

107915518

0

1

71 /vmfs/volumes/53179f45-0812ca33-32ce-000c29b9568e/vm1-w2k3/vm1-w2k3.vmx

Now confused again. Too bad.

-Carlos

Reply
0 Kudos
depping
Leadership
Leadership

Not sure where you read that (page number please :-)).... but it is straight forward: Isolation Response is the response the "isolated host" will take. Nothing more than that. It says nothing about what the other hosts will do. The other hosts will try to restart it when indeed the locks are lost. As at this point it is as if the whole host has died.

What type of storage are you using? Are you isolating the full host or just the management network?

Reply
0 Kudos
TronAr
Commander
Commander

I have no page numbers, cause flow editions do not have static pages Smiley Happy

There are a number of  paragraphs that led me to understand that the master response would depend on the isolation setting.

First, just to have this out of the way, the decission has to be taken by a master, not by any host, right ?

(i,e, the master is the point of logic where the call to restart or not a VM is taken)

Near location 760, "prior to VS 5, vm restarts were always attempted ... DS heartbeating enables a master to more correctly

determine the state ...if the master determines that the slave is isolated it will only take action when it is appropriate to take action

With that meaning that the master will only initiate restarts when vms are down or powered down / shut down by a triggered isolation

response,..."

So from that follows that isolation response does have more impact than the direct response of the host being isolated. It also

affects the response of the master.

This is a home lab, with an NFS and an iSCSI DS. Hosts have two NICs, NFS was reached via second NIC, only first one declared as management.

Hosts are indeed nested inside workstation, but I don't think this matters. The storage is outside the workstation desktop.

The isolation was triggered by disconnecting the NIC at workstation.Just the first, so the NFS DS was still there and the master knew that

it was indeed a host isolated and not a host down state.

Reply
0 Kudos
depping
Leadership
Leadership

I think you are misinterpreting it. This article may clarify it further:

http://www.yellow-bricks.com/2012/12/31/isolation-detection-in-vsphere-5-1-versus-5-0/

Reply
0 Kudos
TronAr
Commander
Commander

I've read that one too.

And it also sends the message that the poweroff file is a message from the isolated

host to the master to act upon. And in my case, there is no such file (because

the policy is to keep the VM running) yet the master did act.

I'm surely confused, and misinterpreting something, yet I don't see what.

Both your book and the cited article show a link between what the slave does and what the

master decides implemented by messaging over the storage heartbeat construct.

Are you saying this is not so ?

Reply
0 Kudos
depping
Leadership
Leadership

You lost me. let me try to summarize it, you claim the following:

esxi hosts with 2 NICs

management traffic on NIC 1 and iSCSI and NFS on NIC 2

Host Isolation to: keep powered on

when you isolate a host by disabling NIC 1 you say that VMs get restarted.

However, VMs can only be restarted when the lock is lost or the VM is powered off due to the isolation. So I can only conclude your storage traffic goes across NIC 1 as well then?

Reply
0 Kudos
TronAr
Commander
Commander

Nope, iSCSI was on NIC one (only), and the VM was there. Only one VM in this experiment.

Yes, lock was eventually lost. But that did not make the isolated host change its mind,

and it was reporting as running the VM.

BTW, I feel you are eluding the answering of my questions, may be a smart move while

you understand what I did Smiley Happy

I appreciate your time.

Reply
0 Kudos
depping
Leadership
Leadership

You lost me, why would it lose the lock on the VMDK while the connection is still there?

Reply
0 Kudos
TronAr
Commander
Commander

The VM was on the iSCSI DS.

First NIC down downed both management and iSCSI DS.

NFS was still live, and there was NFS based DS heartbeats working there.

Reply
0 Kudos
depping
Leadership
Leadership

Let me loop back with engineering, but this result is expected as far as I am aware.

Reply
0 Kudos
kfarkas
VMware Employee
VMware Employee

Carlos, this is Keith from the HA engineering team. Duncan reached out to me. I understand that in your tests, a VM configured to be left powered on during an isolation event is restarted on a non isolated host. This restart should not occur if there is no impact on the accessibility of the heartbeat datastores when the host in question is isolated. However, there is a race that can result in VMs being restarted. This race only occurs if the host that was isolated was the FDM master prior to the isolation. The race would not occur if a FDM slave was isolated. Have you observed a VM restart when you isolated a slave? If you have seen it when a slave was isolated, please file a SR with logs so we can investigate.


The master race condition

When a FDM master is isolated, of course, a new master must be elected. Once elected, this new master will read from the datastores the VMs that are protected. It then places the VMs in a wait state for approx 10s. If in the approx 10s, a slave reports to the master that the VM is powered on it, the master removes this VM from the wait list. At the end of 10s, the master will attempt to restart any of the VMs left on the list. In the case of an isolated or partitioned host, however, in some situations, it can take the master more than 10s to learn which VMs are running on an isolated or partitioned host. When this delay occurs, the master will attempt to restart VMs that are still running on an isolated/partitioned host. This attempt will succeed if the isolated host lost access to the VMs' datastores. The master does not consider the isolation configuration for a VM when deciding whether to restart a VM on an isolated host because a VM on an isolated host can fail and we wanted HA to restart any VMs that so fail.

Is this unexpected behavior an operational issue for you? If so, I can try to get the fix included in an update release.

You can work around this issue by delaying the monitoring period. But, note, that extending the monitoring period will also delay a FDM master's response to a host failure. To work around this issue, set the advanced option das.config.fdm.policy.unknownStateMonitorPeriod to 35s -- i.e, increase it by 25s. See VMware KB: Advanced configuration options for VMware High Availability in vSphere 5.x for more information on the advanced options.

-Keith

TronAr
Commander
Commander

Hi Keith,

yes, the race must have been the reason because when I was trying to understand what happened,

I discovered that the isolated host was the master at the time of isolation.

Is there a document describing the election process more in detail ?

Because I had a hard time dissecting the states (Candidate, SlaveConnecting, etc).

Also, I fail to see here where is the race. Is it between processes in the master ? Any chance the state machine is documented anywhere ?

TIA,

-Carlos

Reply
0 Kudos
depping
Leadership
Leadership

Book described the election process high level, but this is probably the most you can get. Rest of those details are engineering details typically not shared with the ourside world.

Reply
0 Kudos
kfarkas
VMware Employee
VMware Employee

Carlos, Duncan answered your question about documentation. You also asked where was the race. Yes, it is between workflows in the master.

Reply
0 Kudos
TronAr
Commander
Commander

Thanks Keith. Too bad the details are not open... it would be easier to tshoot things,

and think about pros and cons of changing the isolation policy.

In any case, I'm now at ease with isolation behaviour Smiley Happy

-Carlos

Reply
0 Kudos