Recently experienced data loss due to bad blocks on one LUN on our SAN, which was confirmed by our SAN vendor. As a result, when I attempt to download a vmdk from the affected datastore in the datastore browser I recieve the following error: Expected FILE_DATA message. Got: SESSION_COMPLETE.
The vmdk begins the copy process then fails shortly thereafter. In addition, a storage vmotion also fails with more specific error codes which basically indicate what was stated above regarding back blocks on the LUN.
Does anyone have any recommended tools or processes to retrieve the vmdk and avoid the bad blocks? I simply want to download the vmdk from the damaged datastore and upload it to a healthy one.
Hi,
If you are able to power On the VM , then collect the data at filesystem level. I don think you will be able to download vmdk if block is bad/corrupt. Even clone from command line will not work
Regard
Emaad
what memaad said.. also if you can boot the VM (which is now doubtful, if you have bad blocks the VM probably will not boot) you can leave the VM running. Treat the VM as a physical machine, and use VM ware converter to make a new P2V of that VM.
voila, you have a copy of your VM.
In such a case it often is the last resort to read the volume with vmfs-tools from Linux and use ddrescue to copy the flat.vmdk
Just to update this post:
I worked with VMware tech support regarding this issue and the engineer did infact use ddrescue in an attempt to recover the vmdk file. Unfortunately for us the file was unrecoverable as a result of block level corruption on the SAN. ddrescue was able to copy the bits over, but the end of file marker for the vmdk was apparently corrupt as the file grew well beyond its actual size when copied.
The fix in the end was to mount the vmdk in question inside a guest OS and utilize xcopy to move the data to a fresh vmdk file onto a clean LUN.
do you remember whether that engineer used ddrescue based on Linux or based on ESXi ?
Can you clarify the question? I'm not sure that I know how to answer that since I dont understand the distinction. I thought the ESXi Kernel was based on linux.
All I can remember is that we established a remote session and he logged into ESXi and proceeded with ddtools.
ESX kernel was based on Linux when it first appeared but since then they changed it a lot.
I am not aware of a ddrescue version that runs on ESXi but have to admit that I never tried.
You mention ddtools ß was this something you had to download first ?
Sorry, I meant to type ddrescue. I kept calling it ddtools by mistake when the engineer first mentioned it. I dont recall if the engineer downloaded and extracted or if it already existed. Forgive me, I had several things going that day so my attention was divided.
Dont worry - your answers were very useful for me.
Thanks a lot
Ulli
Hi Fiffteencenter,
You are right , support Engineer might have used "dd" command to dump raw data from affeted datastore to new datastore to recover data.
Here is command to do it.
For Example :
dd if=/vmfs/devices/disks/naa.6006016045502500ea8c of=/tmp/naa.6006016045502500ea8c .dd bs=1M count=1024
By running this command I am collecting 1 GB of data.
Then I can dump this data into new datastore to recover data, using similar command.
Regards
Mohammed
Hi,
This command is not to recover data if there is bad sector or bad block.
This command is to get the LVM information from new datastore and dumping it into affected datastore ( whose LVM is missing ).
Regards
Mohammed Emaad
