I have this hobby ESXi 6.7 server with one RAID 0 2TB WD Blue. Because who needs RAID1 when it's a hobby, right?
Now this second hand drive has run fine for the last 4 years, but I noticed that a VM was acting up and realized the drive might be dying. And it turns out it has quite a few Bad Blocks.
The 2 Ubuntu VM's I want to save do still start, although they're very slow. I've tried to copy the data off of it and to the new data storage, but some of de snapshot's won't copy and keeps failing (completed with status failure). I tried to remove the snapshots and consolidate the disk, but that also spits one of two errors:
What can I do to either try and repair the bad block so I can copy the data off of the bad hard drive or make a backup of the VMs so I can restore them?
Is there a way to enable disk health notifications to prevent this mess in the future?
I do already have 2 drives standing by to at least start running this in Raid1.
The VMs still boot with snapshots attached ?
Add new vmdk to the VM, store the vmdk on the other datastore.
Boot VM into Linux LiveCD and run
ddrescue -F /dev/sda /dev/sdb /tmp/copy.log
That will run for a while - and can be restarted if necessary when you make sure to keep the copy.log
When it is done the resulting /dev/sdb is a healthy clone with all snapshots merged in to the basedisk.
Thanks! I'm trying the first one now.
- first check disk names sudo apt add-apt-respository universe sudo apt update sudo apt install gddrescue sudo ddrescue -f /dev/sda /dev/sdb /tmp/copy.log
I used -f instead of -F, as -F is --fill-mode=<types>; fill blocks of given types with data (?*/-*l)
So --force would be more logical 🙂
yup - right. The parameter you need is "force" otherwise it will not write to a harddisk.
Hope your /dev/sdb is the new disk ... but I guess you understood the idea.
This approach often works when vmdks are partially damaged and the damaged area is outside of the partitioned / or used range of the vmdk as seen by the guest.
I did, and first checked if the names were correct and it didn't skip to sdc or whatever.
So the smaller, less important VM, went great and works.
But I screwed up my second main VM when trying to copy the hard drive before. I deleted all the snapshots and tried to consolidate the disk so I would only have to copy one drive. But this VM had the most bad sectors, which caused the disk to be unable to consolidate.
I then tried to restore the snapshots by copying back the -00000x.vmdk parts, but that gave me more errors. So I think That VM is a lost cause and I just have to rebuild based on the 2 year old .vmdk that I was able to copy out.
Errors are when trying to start the main VM:
If you have I/O errors in a VM with lots of snapshots the procedure is quite different.
Deleting snapshots and trying consolidations is not a good idea.
Next time better ask before you try operations that rely on ESXi-snapshot functions: there are better options.