VMware Cloud Community
bob3
Contributor
Contributor

Problem booting VM after datastore move

Hi...

I recently went through a process where we had to rebuild a local datastore and in the process upgraded two machines from earlier versions of ESX to ESXiv4.1. We attached the servers to a SAN and moved all VMs off to the SAN during the maintenance windows, both as machine backups and also to allow us to run the machines from another server temporarily.

The process as a whole went very smoothly, but I have a few VMs that, after moving back to the local datastore, will not start. On one in particular I have been working on, I see these error messages in the log file:

Feb 19 21:28:19.046: vmx| DISKLIB-CHAINESX : ChainESXOpenSubChainNode: can't create deltadisk node 4c3b7a85-wcvmvi01-000001-delta.vmdk failed with error Input/output error (0xbad000a, I/O error)

Feb 19 21:28:19.046: vmx| DISKLIB-CHAIN : "/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001.vmdk" : failed to open (Input/output error).
Feb 19 21:28:19.047: vmx| DISKLIB-VMFS  : "/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001-delta.vmdk" : closed.
Feb 19 21:28:19.047: vmx| DISKLIB-VMFS  : "/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000002-delta.vmdk" : closed.
Feb 19 21:28:19.047: vmx| DISKLIB-VMFS  : "/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-flat.vmdk" : closed.
Feb 19 21:28:19.047: vmx| DISKLIB-LIB   : Failed to open '/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001.vmdk' with flags 0xa Input/output error (327689).
Feb 19 21:28:19.047: vmx| DISK: Cannot open disk "/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001.vmdk": Input/output error (327689).

...so obviously this has to do with some corruption in a snapshot, probably occurred during the file move.

I have been through the steps outlined in http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100423..., and from what I can tell, all the files exist, the CIDs look fine, etc. If I change the .vmx file to boot from the parent base disk, the machine does boot (but of course I am missing all my delta information).

Does anyone have any experience recovering from this? I could just revert to the snapshot and re-create my environments, or trash the machines and rebuild (they are not critical production machines), but before doing that I would like to know if there is a way to somehow recover.

Thanks in advance for any advice.

Tags (4)
0 Kudos
10 Replies
a_nut_in
Expert
Expert

Try this and see if this helps

vmkfstools -i /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001.vmdk /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01_new.vmdk

If this works, the snapshots would be consolidated to one new base image you can attach to the VM and power on.

If this fails, there probably is a corruption the snapshot file

You could try powering the VM from the base disk and check wcvmvi01.vmdk

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
0 Kudos
a_nut_in
Expert
Expert

Try with this disk first

vmkfstools -i /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000002-delta.vmdk /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01_new.vmdk

Try with the earlier command if this does not work - as it will leave out snapshot 2 and try and commit from snapshot 1

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
bob3
Contributor
Contributor

Thank you, I will give this a shot...need to wait for access to the server.

I will give these a try and let you know the result.

Thanks!

0 Kudos
bob3
Contributor
Contributor

Thanks for the suggestion.

Here are the results of the commands, run in order:

~ # vmkfstools -i /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000002-delta.vmdk /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01_new.vmdk

DiskLib_Check() failed for source disk The file specified is not a virtual disk (15).

=====

~ # vmkfstools -i /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001.vmdk /vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01_new.vmdk

Destination disk format: VMFS zeroedthick Failed to open '/vmfs/volumes/510fedc8-5b3be36a-3be0-001a6424a942/wcvmvi01/wcvmvi01-000001.vmdk': Input/output error (327689).

So it seems the 000002-delta disk is the problem child.

Anything else I can try?

Thanks!

0 Kudos
a_nut_in
Expert
Expert

Hey Bob,

flags 0xa Input/output error (327689)

There is a typo on one of the commands I asked you to run, but then, that's OK as both the snapshots seem to be having errors. As you have mentioned, might have happened during the initial copy.

One other thing you could try out is probably just go to the edit settings of the virtual machine and removing the hard drive (currently it will be pointing to a snapshot file) and pointing it to the base disk and powering on the Virtual Machine.

What that will do is essentially bypass the data (and hopefully corruption) in the snapshot files and allow you to access the base image. This will only work if the data in the base image is intact.

Regards

a

Do remember to mark my post as "helpful" or "correct" if I've helped resolve or answer your query!
0 Kudos
bob3
Contributor
Contributor

Thank you, I know I am able to power on pointing the .vmx file to the base disk and bypassing the snapshots altogether. I "recovered" one of the other machines I was having issues with this way. Guess I was looking for something like a "diskchk" utility that could be run on a virtual disk that is having problems. My fault for keeping snapshots around. Luckily this is not a critical machine. I can mount a virtual disk from another VM and run utilities that way, but not a delta disk file, at least I have not been able to.

Thanks for the suggestions.

0 Kudos
continuum
Immortal
Immortal

bob@sag wrote:

My fault for keeping snapshots around.

Snapshots are one of the key features of VMware - so dont blame yourself for using them.
The same can happen with flat.vmdks as well.-

I would rather blame the filesystem : if one small error in the graintable of a snapshot or one bad block on the harddisk has the result that the user has to discard months of work (in worst case ...)  then there really should be a checkdisk tool.

Anyway - there are a few more things you can try ...

- add another snapshot to the VM and then use Converter and try to clone the VM
- use the VDDK toolkit and mount the vmdk from a Linux or Windows host
- read the VMFS-volume with vmfs-fuse from a Linux LiveCD
- clone the VMFS-volume with ddrescue so that it zeroes out bad blocks


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
Ytsejamer1
Enthusiast
Enthusiast

If you can find it, SnapVMX script was always helpful to me to straighten out snapshot chain issues.  VMWare support hooked me up with that one and it was used by them on our issues that kept reoccuring due to some less than stellar backup software.

Also worth mentioning - some of my fellow engineers relied a little too much on snapshots, making my datastores a complete mess.  I ended up retrofitting a PowerCLI script to go through vcenter and identify which VMs had snapshots older than a couple of days.  That script would be run via a scheduled task every day and email me a report...so I could yell at said engineers who had stuff kickin' around.  It might be worth spending the hour or so getting that going.

Edit: See attached

The script won't necessarily do anything that those other commands can.  It's just a handy utility to present your chain and identify any errors.  I had used that in my old 4.1 U2+ environment without issue.  I would always go into the properties of each VMDK in the chain and make sure the parent/child relationships were valid, making changes where they weren't set right.  Most times I'd create a new snapshot once that was validated, I'd create a new snapshot, then delete all of them.

I dont' have access to the environment where I used these, but I've also included my troubleshooting method.  Please forgive any newb-ness in the instructions or application thereof.  I don't fancy myself an expert and just documented what I'd do to resolve the issues I'd encounter.

Message was edited by: Ytsejamer1 - Added snapshot script txt file and added notation.

0 Kudos
bob3
Contributor
Contributor

Thank you for the hint, Yetsejamer1, I have Googled SnapVMX and found a lot about itm but not the script itself yet. I will keep looking and see if it can help me out. At this point it is more of a frustration than it is a roadblock...I have found I can usually boot the machine from the base disk just fine, but other tools I run point to some real corruption in the delta files. I don't know if this tool will treat that any differently than the vmdktools commands I ran, but it is worth a try.

Bob

0 Kudos
Ytsejamer1
Enthusiast
Enthusiast

Hi Bob,

See my original post.  I'm not entirely sure it's even applicable to your issue, but thought I'd throw ideas your way regarding my snapshot issues from the past.  Just an FYI.

Thanks!

0 Kudos