VMware Cloud Community
imclaren
Contributor
Contributor

VM files locked

Hi,

I have a 4-blade ESX cluster, comprising of 3xESX 4.0 hosts and 1xESXi 4.0 host, all connected to a fibre-channel SAN.

One of the luns on the SAN fell over, which had a few knock-on effects, including one of the VMs (which is on a different lun, comprising completely separate disks to the failed one). Its files seem to be locked, and I can't do anything with them, on any of the hosts.

I've been through some of the KB articles about this, and I *think* the lock is on the ESXi blade, which is also where it was last running.

All the blades have been restarted (but not the SAN).

Any suggestions greatly appreciated!

Thanks,

Iain

Reply
0 Kudos
15 Replies
vmhyperv
Contributor
Contributor

Did you restarted managment service.?If its doesnt work then need to find the which esx host locked the file by going to VM directory  using

vmkfstools - D  (VMDk file Name)

thanks

vmguy

Reply
0 Kudos
aravinds3107
Virtuoso
Virtuoso

Check the below KB's which will help

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10051

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100385...

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful |Blog: http://aravindsivaraman.com/ | Twitter : ss_aravind
Reply
0 Kudos
imclaren
Contributor
Contributor

Hi,

Yes, I've been through those. The vmkfstools command seems to suggest it's locked by the ESXi server. I've restarted that, and still no joy...

Cheers,

Iain

Reply
0 Kudos
avarcher
Commander
Commander

Hi mate, how is this lock manifesting itself? Cannot power on VM, VM unavailable or disconnected, gets to 95%... the only thing I can add is if the VM gets to 95% run a 'vmware-cmd <path2vmx> getstate', and if the state is 'pendingquestion' (usual for 95%) then use 'vmware-cmd answer "<answer2question"', where the answer is usualy a number. See http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102683... for more on ESXi. I think it may think your VM has been relocated. Let me know how it goes.

Cheers, Andy.

Reply
0 Kudos
harrygunter
Enthusiast
Enthusiast

Hi,

Have you tried removing the vm from the inventory and re-adding it back in?

Reply
0 Kudos
jimraina
Enthusiast
Enthusiast

Hi imclaren ,

To check for Service Console-based locks on non-ESXi servers, run the command:  lsof | grep  COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME like -71fd60b6- 3631 root 4r REG 0,9 10737418240 23533  Note: If there is no Service Console process locking the file, you should receive no printed output. If you receive any results, however, file a Support Request to not only identify the process, but also to determine root-cause.

"You are stronger than you think"
Reply
0 Kudos
imclaren
Contributor
Contributor

The VMs (there are now two affected) are not in the inventory, and can't be added to the inventory (option is greyed out). Additionally, the files can't be moved/copied/downloaded etc through the datastore browser.

Looking on the consoles, no operations are permitted on the vmdk files, .vmx, .vswp, or .nvram files. Doing anything to them results in 'invalid argument'.

I tried the getstate command, but it just returned 'no vm found with this name'. Do I need a switch on the command, perhaps?

I'll try the other suggestion, but I think it's something I tried last night...

Reply
0 Kudos
harrygunter
Enthusiast
Enthusiast

The VM's affected are on the same LUN? are there any other VM's running on that LUN?

I had an issue once where a VMFS LUN partition table was over written by a linux guy Smiley Sad the VM's that were running were OK but any that were stopped could not be restarted or moved etc.

Have you checked that the partition table is ok?

Reply
0 Kudos
imclaren
Contributor
Contributor

Yes, the affected ones are all on the sam lun. Sounds like a similar issue - the were running fine until they were powered off and restarted...

I'm currently migrating what I can to other datastores. How do I check the partition table?

Cheers,

Iain

Reply
0 Kudos
harrygunter
Enthusiast
Enthusiast

I noticed this when I looked in the datastores properties in the viclient, it should say File System: VMFS x.xx but it was saying LVM.

As per what you are doing I migrated all running VM's off the damaged store.

The linux guys had accidential attached the LUN's to a linux box, they then used fdisk to change the partition type back from LVM to VMFS (think its fb)

Luckly this worked, well for most of the LUNS, a few were damaged and they had to work on them to retrieve the flat files.

This link may give you a pointer, you should be able to check to see if this is the problem.

http://www.virtualizationteam.com/virtualization-vmware/vmware-vi3-virtualization-vmware/vmware-esx-...

Reply
0 Kudos
imclaren
Contributor
Contributor

Hi,

Checked the partition table. It's showing as vmfs 3.31 in the properties, and the partition table looks good in fdisk etc...

Cheers,

Iain

Reply
0 Kudos
imclaren
Contributor
Contributor

Hi,

Called vmware today, and they quicly got someone on the case. They guy's found a corrupt heartbeat on the lun/datastore, which I presume is some kind of quorum clustering type thing.

He's taken the first 32Mb of the raw disk, and is going to try to edit whatever it contains and then (I guess) write it back to the disk. Not for the faint-hearted!

Cheers,

Iain

Reply
0 Kudos
imclaren
Contributor
Contributor

I'm not on site any more, but apparently vmware have worked their magic and all is well again! Smiley Happy

Reply
0 Kudos
harrygunter
Enthusiast
Enthusiast

Good news, I would if possible migrate the VM's off the LUN and onto a fresh one.

Reply
0 Kudos
imclaren
Contributor
Contributor

Yeah, that'd be a good idea. I don't think they have enough space to move one of them though, as it's a whopper!

Reply
0 Kudos