VMware Cloud Community
rickardnobel
Champion
Champion
Jump to solution

How often is the file locks updated (VMFS/NFS)?

As I understand it the ESXi host that is running a certain VM which has its files located on shared storage will have to update the filelock regurarly to show that it is alive and still owns the file. And if the host should fail the filelock updates should stop appearing, which means another host can start up the VM using HA for example.

So my question is, how often is these file locks updated? And is there a difference between VMFS and NFS in this matter? (Besides NFS using *.lck files for saving the locking state, but the time interval.)

My VMware blog: www.rickardnobel.se
0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership
Jump to solution

From the top of my head it is 15 seconds before the lock is declared as dead and released. Indeed, this could be a racing condition as HA will restart the VMs on the 16th second. Don't get me wrong, it is designed in such a way that it will all work, but I still generally recommend increasing the das.failuredetectiontime.

Also, I think the lock region is updated roughly every 3 seconds.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

View solution in original post

0 Kudos
12 Replies
idle-jam
Immortal
Immortal
Jump to solution

it would create a lock file and that the file would only be disappear once you power off the vm. http://www.vmware.com/support/ws55/doc/ws_disks_lockfiles.html

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

idle-jam wrote:

it would create a lock file and that the file would only be disappear once you power off the vm. http://www.vmware.com/support/ws55/doc/ws_disks_lockfiles.html

Thank you for your reply, however the link seems to be about an older version of Vmware Workstation and not vSphere?

What I am curious is about how ESX/ESXi handles this on shared storage, or more specificly: how often is the lock updated by the living host, and if the host fails - after which time could another host make the assumption that a vmdk file now is "free", in case of HA for example.

My VMware blog: www.rickardnobel.se
0 Kudos
ItsMeHere
Enthusiast
Enthusiast
Jump to solution

I'm not too deep into this, but I rather look at the lock for a file as a vmfs file system feature. Once an ESX(i) host starts a VM, it locks the files associated with this VM by setting markers for these files (aka locks) in the vmfs management structures. These can even remain active if the host actually crashes, making it impossible for other hosts in the HA cluster to start this VM.

Please have a look at http://kb.vmware.com/kb/10051, explaining how to manually remove lock remainders.

I hope this helps.

Oh, and yes, there is a difference between vmfs and NFS (as you already mentioned the .lck files), and it's also covered in the KB article mentioned above.

Message was edited by: ItsMeHere

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

ItsMeHere wrote:

I'm not too deep into this, but I rather look at the lock for a file as a vmfs file system feature. Once an ESX(i) host starts a VM, it locks the files associated with this VM by setting markers for these files (aka locks) in the vmfs management structures. These can even remain active if the host actually crashes, making it impossible for other hosts in the HA cluster to start this VM.

Hello and thank you for your reply. It was an interesting KB and it shows ways to deal with incorrectly locked files, but I found no information about the normal procedures, that is when it is working as expected.

That is, assuming a certain *-flat.vmdk files is being locked by ESXi-host-1 for running VM-a. As long as ESXi-host-1 is up and running it must protect the file from other hosts taking it, by somehow regular update the VMFS metadata about this. If now the host suddenly looses power and all VMs is dead. After 15 seconds will HA try to distribute the dead VMs over the surviving hosts, but will the vmdk files already be considered open? That is, will 15 seconds without updating the locks make them available for other hosts?

My VMware blog: www.rickardnobel.se
0 Kudos
depping
Leadership
Leadership
Jump to solution

From the top of my head it is 15 seconds before the lock is declared as dead and released. Indeed, this could be a racing condition as HA will restart the VMs on the 16th second. Don't get me wrong, it is designed in such a way that it will all work, but I still generally recommend increasing the das.failuredetectiontime.

Also, I think the lock region is updated roughly every 3 seconds.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

Duncan wrote:

From the top of my head it is 15 seconds before the lock is declared as dead and released. Indeed, this could be a racing condition as HA will restart the VMs on the 16th second. Don't get me wrong, it is designed in such a way that it will all work, but I still generally recommend increasing the das.failuredetectiontime.

Also, I think the lock region is updated roughly every 3 seconds.

Thanks a lot Duncan for the information. So the default 15 second HA timeout is set to let the file locks go away and all vmdk files should normally be available for taking over as soon as HA is restarting them?

My VMware blog: www.rickardnobel.se
0 Kudos
depping
Leadership
Leadership
Jump to solution

Yes they are. Restarts are initiated on roughly the 16th second. That is initiated, it will take a split second to handle it so that should leave room to restart. These scenarios have all been extensively tested by our engineers / QA. Now it will of course all depend on what your "isolation response" is set to. When it is set to "shutdown" it will more than likely take longer before the VMs are down and can be restarted.

Duncan (VCDX)

Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

MKguy
Virtuoso
Virtuoso
Jump to solution

Even if the first restart at the 16th second fails due to a still active lock, the surviving hosts should keep trying to restart the VMs a couple of times according to das.maxvmrestartcount before they give up on it.

See Duncans article on this:

http://www.yellow-bricks.com/2010/06/30/how-does-das-maxvmrestartcount-work/

And a quote from Duncans HA-Deepdive site:

The amount of retries is configurable as of vCenter 2.5 U4 with the  advanced option “das.maxvmrestartcount”. The default value is 5.
-- http://alpacapowered.wordpress.com
rickardnobel
Champion
Champion
Jump to solution

Thanks a lot for all answers.

My VMware blog: www.rickardnobel.se
0 Kudos
depping
Leadership
Leadership
Jump to solution

Posted an article based on your question as well this week:

http://www.yellow-bricks.com/2011/04/04/das-failuredetection-time-and-the-isolation-response/

Duncan

Yellow-bricks.com | HA/DRS technical deepdive - the ebook!

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

Duncan wrote:

Posted an article based on your question as well this week:

http://www.yellow-bricks.com/2011/04/04/das-failuredetection-time-and-the-isolation-response/

Thanks, nice. (I also got your book some day ago, very interesting.) Smiley Happy

My VMware blog: www.rickardnobel.se
0 Kudos
depping
Leadership
Leadership
Jump to solution

Thank you very much for your support. I hope you enjoy it!

Duncan

Yellow-bricks.com | HA/DRS technical deepdive - the ebook

0 Kudos