breakaway9000
Enthusiast
Enthusiast

Disk with bad sectors -- how to get data out?

Jump to solution

I've got a disk that's on its way out in my home lab. It's got some VMs on it that I'd love to recover. The SMART status is showing a few hundred bad sectors. I'm using ESXi 5.1 build 799733

The VMs appear to run but when I attempt to copy them to another disk through the vSphere client I get a generic error and it crashes out. vmkernel.log shows I/O fail issues and 'unexpected sense'.

I'm wondering if:

(1) there is a way to get ESXi to re-attempt the copy more than one time as I believe the copy may succeed after multiple retries (since the VMs run OK but can't copy them).

(2) it is possible to run something similar to a chkdsk utility that can be run on the vmfs to see if the filesystem can recover the bad blocks?

(3) as a last resort is it possible to get ESXi to continue with the copy despite I/O errors. Perhaps then I can run chkdsk inside the OS and that will heal it.

(4) what tools (if any) can I use to clone the dying disk to a known good disk? Will dd work?

Tags (4)
0 Kudos
1 Solution

Accepted Solutions
continuum
Immortal
Immortal

I agree with the suggestion to try gddrescue.
With the problem VM powered off - start a Linux VM from a different datastore and mount the datastore with the bad blocks via vmfs-fuse and sshfs.
Then copy the flat.vmdk out via ddrescue.

Sometimes it also helps to manually create a snapshot for the bad VM on another datastore. That sets the original vmdk into readonly mode.
Once you have the snapshot you can then clone the empty new snapshot with the damaged basedisk via vmkfstools -i.

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"

View solution in original post

0 Kudos
7 Replies
virtualkitten
Enthusiast
Enthusiast

In linux you could give a try to recover damaged disks by using "ddrescue" and "testdisk".

Then you could mount it using "qemu" and access from a Linux CDLive. Whatever you decide to do, get a raw dump of it before you play. Once you have the VMs in a local storage you control you can try to upload again.

I suppose there will be a "VMWare" way of doing that... maybe others have more information.

continuum
Immortal
Immortal

I agree with the suggestion to try gddrescue.
With the problem VM powered off - start a Linux VM from a different datastore and mount the datastore with the bad blocks via vmfs-fuse and sshfs.
Then copy the flat.vmdk out via ddrescue.

Sometimes it also helps to manually create a snapshot for the bad VM on another datastore. That sets the original vmdk into readonly mode.
Once you have the snapshot you can then clone the empty new snapshot with the damaged basedisk via vmkfstools -i.

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"

View solution in original post

0 Kudos
gregsn
Enthusiast
Enthusiast

If you are using a single SATA disk, one option may to get another disk of same or larger size and clone the original disk to it using a tool that will skip over bad sectors, such as: HDDGURU: HDD Raw Copy Tool

Then run chkdsk inside the virtual machines to fix any file system errors.  If you had files sitting on top of the bad sectors, they will most likely be damaged to some degree.

0 Kudos
breakaway9000
Enthusiast
Enthusiast

Thanks for the assistance and ideas friends.

I have spun up a Ubuntu VM on a new datastore, mounted the VMFS volume using vmfs-fuse over sshfs and just using the 'cp' command I have had success getting the one machine off the broken disk onto the good datastore.

This disk is truly circling the drain (SMART shows many bad & pending sectors) so I'll be happy with anything I can get!

Thanks again.

0 Kudos
virtualkitten
Enthusiast
Enthusiast

I would recommend using ddrescue (it may have a GUI), this is an example

ddrescue --direct --retrim --max-retries=3 /dev/hda1 imagefile logfile

But read the documentation. You will need a disk with the same size that you are "cloning".

0 Kudos
asmme
Contributor
Contributor

Hello continuum, please could you help me?

I find myself in a situation where a VM is started but it is not possible to backup and you can not copy it elsewhere.

I tried the convert cd 3.0 and 4.0, trilead backup, veeam and vcenter, I have tried in copying files and browsing the datastore, I tried to clone from the shell, but I always received the same error: can not copy files or sectors damaged

Please show me how you "mount the datastore with the bad blocks away vmfs-fuse and sshfs." from a virtual machine?

Thank you

0 Kudos
continuum
Immortal
Immortal

Hi
> I tried the convert cd 3.0 and 4.0, trilead backup, veeam and vcenter, I have tried in copying files and browsing the datastore, I tried to clone from the shell ...
Thats the problem.
All those attempts are coming from a user that demands writeable files. So ESXi may deny access on a file-level.
In this case it helps to use readonly access on a blocklevel.

When I do a remote support session I create when ever possible a small Linux VM with direct ssh-access to the ESXi in question.

I mount the whole esxi-filetree readonly.
Next step is to find out where the "bad.vmdk" actually is located on the phsical storage.
If you are lucky you get one fragment starting at offset x with the full 2tb following.
If you have to pay for thin provisioning and snapshots you may also get a 1.000.000 single fragments scattered all over the place.
Anyway - once you know how to read the volume by one mb-pieces you can start to work around the failing locations.

just an example ...
IF=/esxi/dev/disks/naa.50012345678
OF=/vmfs-out/mailserver-flat.vmdk
CMD=dd if=$IF of=$OF bs=1M conv=notrunc

$CMD seek=185379  count=1  skip=226434 # fragment=19510#

$CMD seek=185395  count=19  skip=226435 # fragment=19515#

$CMD seek=185415  count=1  skip=226454 # fragment=19517#

$CMD seek=185417  count=2  skip=226455 # fragment=19519#

$CMD seek=185421  count=1  skip=226457 # fragment=19522#

$CMD seek=185378  count=1  skip=226458 # fragment=19509#

$CMD seek=185423  count=1  skip=226459 # fragment=19524#

$CMD seek=185394  count=1  skip=226460 # fragment=19514#

What was uncopieable before now is a bunch of pieces and if cmd1 does not work you have a small list of alternative commands.

I am quite busy at the moment - if you have any urgent issues call via skype

Ulli

Do you need support with a recovery problem ? - send a message via skype "sanbarrow"
0 Kudos