I had a power failure at home the other day so my test ESX environment (all one server of it) went down suddenly when the UPS battery went flat (no agent, another story , another time). Anyway...
When it came back up one of the servers failed to load because of a missing VMDK file (the config file, not the flat file). Long story short; moved the flat file, renamed the original folder, removed the machine from VirtualCenter, re-created the machine and replaced just the vmdk flat file and everything is back on track.
EXCEPT; I now have the renamed folder "MarshalVM.000" with no files in it. The trouble is, I cannot remove the folder because it isn't empty. When I do an "ls" command it says I have a file called "MarshalVM.vmdk" (yes, the 'lost' file) which does not exist! I can't rename the file or delete the file because the service console says it doesn't exist. Therefore I cannot delete the parent folder. How do I resolve this issue. Under Windows I'd be looking to do a CHKDSK or something like that. I see their is a linux command called "fsck" but it won't run directly from the VMFS volume and when run from the root of the Service Console it wants to do all volumes and complains they are mounted. I don't really know enough about it to proceed comfortably. I'm also not certain whether this command will hose the VMFS file system or not. Further googling suggests that it may not be the right answer. Any ideas?
Have you tried rebooting the host or bringing it up in Service Console only mode so the VMkernel doesn't start?
It almost sounds like that file is still being used by a VM.
Reboot; yes. Start in Service Console mode; no. I don't believe the file is in use. I believe corruption of the vmfs file system causes ESX to believe the file exists when it does not. Hence why you cannot see it in a directory listing, but it still says the file exists but cannot be found, if you see what I mean (see below).
# ls ls: MarshalVM.vmdk: No such file or directory
Attempts to create a file with the same name fail with the error:-
# echo "test file" > MarshalVM.vmdk -bash: MarshalVM.vmdk: File exists
Attempts to delete the file fail with the error:-
# rm MarshalVM.vmdk rm: cannot lstat `MarshalVM.vmdk': No such file or directory
So you have a ghost file the is in a directory on your VMFS filesystem correct. This file is in a directory that you cannot remove, because the file is in that directory, yes?
Have to you tried rm -rf directoryname where directoryname is the name of the directory where the ghost file is located? This command forces removal of a directory and all it's contents.
What are the results of
ls -la directoryname
Have you tried
chown -R root directoryname
chgrp -R root directoryname
then
rm -rf directoryname
Just a lark you could try
vmkfstools -U /vmfs/volumes/<yourdatastorename>/MarshalVM.000/MarshalVM.vmdk
Hi Lightbulb. Thanks for your continued suggestions, but no further progress I'm afraid. Results were as follows in each case.
ls -la MarshalVM.000
ls: MarshalVM.000/MarshalVM.vmdk: No such file or directory
total 1088
drwxr-xr-x 1 root root 420 Feb 16 21:20 .
drwxr-xr-t 1 root root 1400 Feb 16 21:28 ..
-
]# chown -R root MarshalVM.000
chown: failed to get attributes of `MarshalVM.000/MarshalVM.vmdk': No such file or directory
-
chgrp -R root MarshalVM.000
chgrp: failed to get attributes of `MarshalVM.000/MarshalVM.vmdk': No such file or directory
-
rm -rf MarshalVM.000
rm: cannot remove directory `MarshalVM.000': Directory not empty
-
vmkfstools -U /vmfs/volumes/Disk1/MarshalVM.000/MarshalVM.vmdk
Failed to delete virtual disk: The system cannot find the file specified (25).
Well if this was a production system I would say lets get the VMs somewhere else and recreate the VMFS volume.
A previous poster indicated going into SC only mode. Which you could try but I think you may be right and there is something a little whacked on your datastore. Since I assume you have a single store you could use scp or vmware converter and get you VMs off, reinstall and then import the VMs back in.
Or you could play with it some more because that is a great way to learn.
Good luck and let me know how it comes out.
I have more than one store, so I could of course move the VMs, wipe and re-format. But that is the easy option and I prefer to try to understand and fix. After all, this may be a perfectly acceptable approach at home, but what if this was a customer system and they had no space to move VMs, or no vMotion and no outage windows, etc, etc. I'll sit on it for a while, see if I get any other ideas or responses then decide what to do. Thanks for your help so far.
Did you ever figure this out. I have a simalar issue when a migration failed. I belive it was an iSCSI issue.
~ Joe