Disk recovery, help!

zhanglin999 · ‎04-22-2019

Hi,

After a power failure, one datastore become disconnected in our esxi host. It's raid5 consists of 3 disks. I book with an Ubuntu live CD and I can get following information:

ubuntu@ubuntu:~$ sudo fdisk -l /dev/sde

Disk /dev/sde: 3.3 TiB, 3599451029504 bytes, 7030177792 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: gpt

Disk identifier: 543D936A-AE82-4ABE-A2FB-ED296E861BC1

Device Start End Sectors Size Type

/dev/sde1 2048 7030177758 7030175711 3.3T VMware VMFS

but when I try to mount with vmfs-fuse it outputs following error:

ubuntu@ubuntu:~$ sudo vmfs-fuse /dev/sde1 /mnt/vmfs

VMFS VolInfo: invalid magic number 0x00000000

VMFS: Unable to read volume information

Trying to find partitions

Unable to open device/file "/dev/sde1".

Unable to open filesystem

Is it possiblt to get my data back? continuum, I see you help so many guys for similar issue, so try to @you to see if you can help 🙂

continuum · ‎04-22-2019

hi
Do you still have the system booted into Linux ?
Then please dump the first 1500 mb of sde to a file
dd if=/dev/sde bs=1M count=1536 of=zhanglin.1536
Compress that file and provide a download-link,
The dump will show me if your raid5 is still healthy.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

zhanglin999 · ‎04-22-2019

This is the link in onedrive,

https://onevmw-my.sharepoint.com/:u:/g/personal/zzhou_vmware_com/ERNRFZsbmCxOmOyUtX9BhIoBcvbWXDjrKYO...

I dump the /dev/sde1 in this file. If /dev/sde is required, I will dump it and upload it again.

Many thanks!

continuum · ‎04-23-2019

To anybody reading this post ....
This is a typical example why I tell my customers that the combination:
- local storage +

- Raid5 +
- VMFS +
- unreliable powersupply

------------------------------------
= unacceptable risk.
@ zhanglin
In the current state the VMFS is garbage.
The VMFS magic number is not present at the expected offset.
The pointers to the hidden .sf files are not at the expected offset.
Apparently the raid-controller could not handle the power failure correctly.
Only solution that I can suggest is to use the disks without the Raid-controller.
Then you either use a commercial tool to build a virtual Raid-array or use Linux to setup a software Raid.
If you are lucky - and if you can query the existing Raid-controller for the necessary parameters like stripe-size and so on it may be possible to re-assemble the array.
At first sight the data seems to be still there: the vmdk-descriptorfiles for example are present and can be extracted as expected.
VMDK descriptorfiles appear to be smaller than the used stripesize so they come in one piece.
VMX-files - larger than vmdk--descriptors - are already incomplete - which is a sign that the Raid no longer has the correct structure.
The best you can get from this array in its current state is files with a size of a few kb.
Everything larger than that is unusable.
So the next thing to do - if the data is important - is to use the raid-disks one by one and try to build a virtual raid which hopefully is healthy enough to create dd-scripts for the vmdk-files.

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

gregsn · ‎04-23-2019

In my experience, VMFS can be used safely as local storage when:

The RAID controller MUST have protected write cache (preferred capacitor based).
If the RAID controller does not have protected write caching ability, write caching MUST BE DISABLED.
Caching on the disk level MUST be disabled (usually set at the controller level).
Use top-tier RAID controller brands (eg. Adaptec/Microsemi or LSI) that have the ability to properly implement the above.

I can't vouch for other brands since I've only had first hand experience with Adaptec or LSI controllers and have never suffered VMFS loss after a power outage (or any other unexpected/dirty shutdown) in the above configuration.

I also recommend (not a hard requirement to protect VMFS in a dirty shutdown scenario) RAID6 or 60 as a minimum RAID configuration (ie. at least dual-disk failure ability for any data stored on the array).

It would be interesting to know what the configuration the OP had to create this failure so it can be avoided in the future.

PS:

The only time I've lost a VMFS using the above scenario is not during a power outage but due to either a disk sending faulty data back to the controller (eg. a "lower brand" SSD sending garbage data back to the controller) or data corruption during RAID 6 rebuild (a bug I personally confirmed on older Adaptec firmware that will corrupt a RAID 6 volume during a rebuild with >2TB disks). With that said, I now only use Intel brand SSDs which have capacitor backed protected write caching have had zero issues since.

zhanglin999 · ‎04-24-2019

This is really sad news 😞

We starts to try DiskInternals today but it's not completed yet. If there is any progress, I will update it here to let you know.

Anyway, I do appreciate your help!

--zhanglin

continuum · ‎04-24-2019

I would not expect results with Diskinternals or UFSexplorer in the current state.
Both tools can do a good job with a slightly corrupted VMFS-volume - but at the moment the content of the area reserved for the metadata does not qualify as valid VMFS.
So you are basically starting a raw scan looking for filesignatures.
I expect that you may even find some promising pieces but I doubt that you will find any useable file larger than a few hundred KBs.
Anyway - please keep us updated.

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

jclave · ‎12-20-2020

Good morning,

You can help us to recover this VMs?

My email: gatmuha@gmail.com

Thanks

continuum · ‎12-20-2020

After looking into your question the appropriate reply would be an email with the analysis "maybe"

If you have a serious request for a housecall
then come prepared.

skype-message:
Hi Ulli
my dump is here
this happened
can you recover videos-of-marriage-flat.vmdk ?
my wife will roast me alive if I dont have them back before their parents arrive ...
please let me know if there is a chance ....

Then I look into your dump ... and if you have none or were to lazy to even try to create one
then the chances for a successful interaction are too low to get started anyway.

First contact by a "me too" approach is not very convincing ...

Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

All

Disk recovery, help!