VMware Cloud Community
swspjcd
Enthusiast
Enthusiast

can't register/add to inventory a vm because of locked file

I have several ESXi 5.1 hosts in a cluster. There is one vm that I can't add to inventory. I have followed several kb articles and even removed the ESXi host from the cluster completely after verifying that it was the ESXi host causing the lock by finding the MAC address. I am unable to view the vmware.log file for this virtual machine. I get an "invalid argument" error when trying to cat the vmware.log or the .vmx file. The lock file is "vmname.vmx.lck". I've rebooted and restarted the management agents several times. I'm just not sure where to proceed from here as I've been reading about how to resolve this for about 3 hours and have yet to find anything that works. The contents of the directory contain the following files if this helps at all:

-rw-r--r--1 root root    162078 Sep 21 16:06 vmware.log
-rw-r--r--1 root root        73 Sep  9 18:33 vmname-63cf5ada.hlog
-rw-------1 root root 21474836480 Sep 21 16:51 vmname-flat.vmdk
-rw-------1 root root      8684 Sep  9 18:34 vmname.nvram
-rw-------1 root root       517 Sep  9 18:33 vmname.vmdk
-rw-r--r--1 root root         0 Sep  7 17:50 vmname.vmsd
-rwxr-xr-x1 root root      3342 Sep 13 18:18 vmname.vmx
-rw-------1 root root         0 Sep  9 18:33 vmname.vmx.lck
-rw-r--r--1 root root       262 Sep  7 17:50 vmname.vmxf
-rwxr-xr-x1 root root      3341 Sep 13 18:18 vmname.vmx~
-rw-------1 root root 37580963840 Sep 21 16:51 vmname_1-flat.vmdk
-rw-------1 root root       519 Sep  9 18:33 vmname_1.vmdk

Suggestions?

22 Replies
Troy_Clavell
Immortal
Immortal

not sure if you've seen the below KB, but just in case.

http://kb.vmware.com/kb/10051

0 Kudos
swspjcd
Enthusiast
Enthusiast

That was one of the first things I tried. The ESXi host is in maintenance mode so there are no virtual machines running on it and therefore nothing to kill. I have no clue how to resolve this as all of the suggestions I have found don't seem to work.

0 Kudos
admin
Immortal
Immortal

Hi,

The vmname.vmx.lck seems to be a  NFS lock..? are you having NFS storage if so

1. browse to the folder where the VM is located and run ls -la

It should list the hidden files.

2. find the files with extension .lck and delete them. and rename the vmname.vmx.lck

using the command mv vmname.vmx.lck vmname.vmx.backup.

3. Add the VM back to Inventory and Power ON the VM, it should work.

Thanks,
Avinash

swspjcd
Enthusiast
Enthusiast

We are not using NFS at all. There is only one .lck file and I can't delete it although I can rename it. Even after renaming it, I still can't register the vm. It is still grayed out.

0 Kudos
admin
Immortal
Immortal

can you run the command vmkfstools -D vmname.vmx and  ls -la

and paste the output...?

Thanks,

Avinash

0 Kudos
swspjcd
Enthusiast
Enthusiast

Here is the output.

Lock [type 10c00001 offset 4405248 v 132, hb offset 3424256

gen 25, mode 1, owner 51ffc163-c2f3c8e4-8cff-001d092b0694 mtime 126477

num 0 gblnum 0 gblgen 0 gblbrk 0]

Addr <4, 0, 39>, gen 71, links 1, type reg, flags 0, uid 0, gid 0, mode 100755

len 3342, nb 1 tbz 0, cow 0, newSinceEpoch 1, zla 2, bs 8192

0 Kudos
admin
Immortal
Immortal

Hi,

seems still there is a lock on the file from the MAC address - 001d092b0694.

try registering the VM to the Host who is the owner of the  MAC address and power on if still fails.

cd into the vm folder and try running the command rm -rf  *.lck and then power ON


if still a problem then

Power off the host and then run the command rm rm -rf  *.lck


Thanks,

Avinash

0 Kudos
swspjcd
Enthusiast
Enthusiast

I've tried all of those. I thoroughly followed the kb article describing how to do all of this. The option to register the vm on the ESXi host that has that mac address, is grayed out. I have tried deleting the lck file. When trying to delete it, I get "invalid argument". Even when powering off the ESXi host that has that MAC address and trying to delete the file on oanother ESXi host, I still get "invalid argument" when trying to delete the file.

0 Kudos
admin
Immortal
Immortal

what does the VMkernel.log and Hostd.log show up..? do you see any corruption messages..?

0 Kudos
swspjcd
Enthusiast
Enthusiast

I don't see anything in either log that looks suspicious. We replicate our LUNs off site every night for DR, and even after presenting the clone of the LUN where this server lives, to an ESXi host at a remote site, I still can't delete the lck file or even add the virtual server to inventory.

0 Kudos
hodo
Contributor
Contributor

Was this ever resolved?  I'm running into this exact issue as well.

0 Kudos
tomtom901
Commander
Commander

Have you tried the steps outlined above?

0 Kudos
hodo
Contributor
Contributor

Just like swspjcd I followed everything in VMware KB: Investigating virtual machine file locks on ESXi/ESX and still cannot remove the lock file.  It is a VMFS datastore, not NFS.  I've restarted the host that has the MAC address shown when running vmkfstools -D lockfile.  I've also restarted every other host in the cluster for good measure. 

0 Kudos
swspjcd
Enthusiast
Enthusiast

We were never able to get it resolved and from it looks like, the problem is due to a bug in equallogic firmware which can on "rare" circumstances, cause corruption in the metadata of a LUN. So far we have had 3 virtual servers with the exact same problem, all of which were on the same LUN. One we recovered from our backup software, the other two were able to be recovered from the replicated LUN of where they lived. The bad firmware, according to Dell is 6.06 and it is fixed by 6.06-H2.

0 Kudos
hodo
Contributor
Contributor

Interesting.  Thanks for replying.

We are running Equallogic as well but still on firmware 6.0.5.  Hopefully I can salvage the VM's from our replicated volumes or snapshots.

0 Kudos
tomtom901
Commander
Commander

Severe. Has Dell acknowledged this?

0 Kudos
hodo
Contributor
Contributor

From the latest, 6.0.6-H2, Equallogic firmware update:

Issue Corrected in Version 6.0.6-H2

[CRITICAL]: In rare circumstances, an error handling routine was not properly executed. Currently, this has only been observed in VMware environments, where a VMFS datastore might experience heartbeat metadata corruption, impacting the ability to perform operations on virtual machines (VMs). However, there is a small risk that a similar issue could be observed in non-VMware environments as well, so Dell recommends that all customers upgrade to this release.

0 Kudos
swspjcd
Enthusiast
Enthusiast

If you read the release notes for 6.06, it says can cause "metadata heartbeat corruption" under rare circumstances, or something very similar. We were running 6.06 for roughly 2 weeks and so far, have had 3 virtual servers that were on the same LUN, become corrupt. They all had the exact same symptoms as in my original post in this thread. It's possible that the LUN was corrupted from something else but we have nevr had any problems like this before, and not all of the servers that were on that LUN, were corrupted. We have since moved all of the servers to a different LUN and deleted the corrupted/faulty LUN.

0 Kudos
hodo
Contributor
Contributor

We had 3 guests out of 10 that were affected.

I was just able to mount a snapshot of this datastore from earlier today before the problem started and got those guests back. My next step is to clear this datastore out and re-create this LUN. And obviously try to get the firmware upgraded.

Thanks again for replying swspjcd.   It has been a frustrating morning trying to troubleshoot this problem.

0 Kudos