ESXi 6.7 Bad hard drive; Need help saving data

Fancyland · ‎07-10-2022

Hey y'all,

I have this hobby ESXi 6.7 server with one RAID 0 2TB WD Blue. Because who needs RAID1 when it's a hobby, right?
Now this second hand drive has run fine for the last 4 years, but I noticed that a VM was acting up and realized the drive might be dying. And it turns out it has quite a few Bad Blocks.

The 2 Ubuntu VM's I want to save do still start, although they're very slow. I've tried to copy the data off of it and to the new data storage, but some of de snapshot's won't copy and keeps failing (completed with status failure). I tried to remove the snapshots and consolidate the disk, but that also spits one of two errors:

1st VM; State: Failed - Detected an invalid snapshot configuration.
Errors: An error occurred while consolidating disks: A required file was not found.
2nd VM fails at 80%;State : Failed - An error occurred while consolidating disks: 5 (Input/output error).
Errors: An error occurred while consolidating disks: 5 (Input/output error).

What can I do to either try and repair the bad block so I can copy the data off of the bad hard drive or make a backup of the VMs so I can restore them?
Is there a way to enable disk health notifications to prevent this mess in the future?

I do already have 2 drives standing by to at least start running this in Raid1.

continuum · ‎07-10-2022

The VMs still boot with snapshots attached ?
Very good.
Add new vmdk to the VM, store the vmdk on the other datastore.
Boot VM into Linux LiveCD and run
ddrescue -F /dev/sda /dev/sdb /tmp/copy.log

That will run for a while - and can be restarted if necessary when you make sure to keep the copy.log
When it is done the resulting /dev/sdb is a healthy clone with all snapshots merged in to the basedisk.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Fancyland · ‎07-10-2022

Thanks! I'm trying the first one now.

- first check disk names 
sudo apt add-apt-respository universe
sudo apt update
sudo apt install gddrescue
sudo ddrescue -f /dev/sda /dev/sdb /tmp/copy.log

I used -f instead of -F, as -F is --fill-mode=<types>; fill blocks of given types with data (?*/-*l)
So --force would be more logical 🙂

continuum · ‎07-10-2022

yup - right. The parameter you need is "force" otherwise it will not write to a harddisk.
Hope your /dev/sdb is the new disk ... but I guess you understood the idea.
This approach often works when vmdks are partially damaged and the damaged area is outside of the partitioned / or used range of the vmdk as seen by the guest.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Fancyland · ‎07-12-2022

I did, and first checked if the names were correct and it didn't skip to sdc or whatever.

So the smaller, less important VM, went great and works.

But I screwed up my second main VM when trying to copy the hard drive before. I deleted all the snapshots and tried to consolidate the disk so I would only have to copy one drive. But this VM had the most bad sectors, which caused the disk to be unable to consolidate.
I then tried to restore the snapshots by copying back the -00000x.vmdk parts, but that gave me more errors. So I think That VM is a lost cause and I just have to rebuild based on the 2 year old .vmdk that I was able to copy out.

Errors are when trying to start the main VM:

File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
File system specific implementation of Ioctl[file] failed
The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child
Cannot open the disk '/vmfs/volumes/5d44916e-b77ab5d8-6917-60a44c52891c/Ubuntu main server/Ubuntu main server-000009.vmdk' or one of the snapshot disks it depends on.
Module 'Disk' power on failed.
Failed to start the virtual machine.

continuum · ‎07-12-2022

If you have I/O errors in a VM with lots of snapshots the procedure is quite different.
Deleting snapshots and trying consolidations is not a good idea.
Next time better ask before you try operations that rely on ESXi-snapshot functions: there are better options.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

All

ESXi 6.7 Bad hard drive; Need help saving data