VMware Cloud Community
ArrowSIVAC
Enthusiast
Enthusiast
Jump to solution

Repair / Recover VMFS Volume

I have a situation where someone had an older vSphere 4.x cluster. The VMs were hosted off of a iSCSI SAN controller to the nodes in the cluster.  The volume was 260GB originally when I made it a few years back.   Subsequently they have added enough VMs that they needed to expand.  They expanded the LUN capacity to 3TB, but was never able to expand the partition.   With VMFS3 this is an issue.

Seems if you epand the LUN, EVEN IF you NEVER expand the partition, your systems, if they unmount the volume can never mount it again.

This apparently went on for some time where over time more systems would not "access" the LUN.... and after a power outage, no system could then access the dozens of VMs.

Now I need to find a way to mount that 3TB iSCSI LUN, even if read only, long enough to mount the 260 of VMs off of it to NFS share i have ready.

Goal: map iSCSI LUN to Windows / Linux system.  Mount the volume RO.  Copy data to NFS export. Destroy and rebuild the data on new iSCI VMFS5 lun.

I can mount the LUN to CENTOS system and parted shows the 3TB LUN but primary partition only 260GB.  So I see the data as expected.   Now I tried to compile 'vmfs-tools-0.2.5"  but get compiler errors.  Nothing of any help or note. I tried ubuntu and CENTOS48 also and errors not much help.

Is their another, more simplistic way to get this accomplished?

Thanks,

37 Replies
continuum
Immortal
Immortal
Jump to solution

Hi
latest commandline-only version see http://vm-sickbay.com
earlier version with graphical interface http://sanbarrow.com/iscsiworkshop/moa64dvd.iso

Please dont use any older versions - I update them every now and then for good reasons.

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
xamboozi
Contributor
Contributor
Jump to solution

I know this is an old thread, but I used it a lot to fix my own issue. If you are getting similar symptoms and the "magic number" error, please check to make sure you haven't taken a snapshot on your SAN. I'm running ZFS and took a snapshot of the zvol containing my datastore. This is the output I got in the vmkernel log when trying to rescan the storage adapter:

[root@ESXI:~] tail /var/log/vmkernel.log

2015-08-28T02:35:57.160Z cpu4:33159)NMP: nmp_ResetDeviceLogThrottling:3345: last error status from device mpx.vmhba32:C0:T0:L0 repeated 19 times

2015-08-28T02:37:40.649Z cpu6:32797)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x9e (0x439d806e13c0, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2015-08-28T02:37:40.717Z cpu0:43827)LVM: 10060: Device naa.6589cfc000000903f337f92db5f049d3:1 detected to be a snapshot:

2015-08-28T02:37:40.717Z cpu0:43827)LVM: 10067:   queried disk ID: <type 1, len 21, lun 0, devType 0, scsi 0, h(id) 13996138677875135599>

2015-08-28T02:37:40.717Z cpu0:43827)LVM: 10074:   on-disk disk ID: <type 1, len 21, lun 0, devType 0, scsi 0, h(id) 1294898510287430916>

2015-08-28T02:37:40.822Z cpu6:32797)NMP: nmp_ThrottleLogForDevice:3130: last error status from device mpx.vmhba32:C0:T0:L0 repeated 10 times

2015-08-28T02:37:40.922Z cpu0:43827)FSS: 5327: No FS driver claimed device 'control': No filesystem on the device

2015-08-28T02:37:40.923Z cpu0:43827)VC: 3551: Device rescan time 77 msec (total number of devices 6)

2015-08-28T02:37:40.923Z cpu0:43827)VC: 3554: Filesystem probe time 205 msec (devices probed 5 of 6)

2015-08-28T02:37:40.923Z cpu0:43827)VC: 3556: Refresh open volume time 1 msec

If you see this, you need to resignature the datastore. I didn't lose anything when I did the resig using the "Keep existing signature" option under "Add storage".

Capture.PNG

If you took a snapshot, and are not seeing the "detected to be a snapshot", let me know. There were a couple other settings I messed with on the FreeNAS extent like the compatibility setting, but I'm not sure whether or not it had an effect. Also, I'm in the middle of a recovery from a messed up pool configuration, so the proper fix is probably to delete the snapshots. I just need to access them again to back them up.

0 Kudos
jagdish_rana
Enthusiast
Enthusiast
Jump to solution

Hi,

Thanks for feedback.

0 Kudos
jhickss
Contributor
Contributor
Jump to solution

Hi, moa64dvd.iso dont boot

help please

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Hi Jose
no need to reply here - just wanted to update this post for other users.In case someone finds this thread via google : do not try to download any of those old versions that are mentioned here.
Even the latest vmds-tools build are outdated nowadays.
vmfs-fuse is still very very useful - but with ESXi 5.5 and later it has problems.
In some conditions the max size of a vmdk that you can extract with vmfs-fuse is 256gb.
Unfortunately I dont expect to see an updated version of the vmfs-tools - the project seems to be dead since a good while.
That does not mean that we are running out of options.
On the contrary - nowadays I usually get better results than 2 or 3 years ago.
But it was sure easier in the days of ESXi 5.1 and earlier.
You will find the current ISO-files I use on http://vm-sickbay.com
If they dont boot - please complain and call me on skype
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
marekivan
Contributor
Contributor
Jump to solution

Hi.

I'm having a similar problem. After failed RAID member and RAID rebuild, datastore is no longer visible. The volume is visible in devices, but in datastores nothing is visible.

I tried rebuilding the partition with partedUtil, vmfs-tools - failed to mount because of invalid magic number 0x00000000, tried cloning volume from the RAID to a single disk and recover data with UFS Explorer. UFS identifies the partition as VMFS (as well as parted). I see all the files and managed to copy all of them except for the flat.vmdk and swap file.In the kernel log I see no FS driver claimed this partition. It has no label. The partition is VMFS5 and I'm using ESXi 5.1. I tried VOMA in ESXi 6 with the fix option, but it failed as well. Any help appreciated.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Hi Marek
if you create a dump of the vmfs-meta data - thats the first 1536mb of the VMFS-partition - and provide a download -then I can assist.
Call me via skype - see my signature.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
marekivan
Contributor
Contributor
Jump to solution

Hi.

Thanks for the offer, I sent you the link to the dump via Skype.

Rgds

Marek

0 Kudos
continuum
Immortal
Immortal
Jump to solution

We looked at the case together and agreed to give up.
Diagnosis: Raid 5 rebuild gone terribly wrong - this is a case for Ontrack :smileycry:
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
marekivan
Contributor
Contributor
Jump to solution

Hi Ulli.


Thank you very much for your time, help and valuable tips today. I'm really grateful that you tried to help me recover the data and showed me some very interesting techniques. Let me know if you find something in the dump I sent you earlier.

Marek

0 Kudos
es3t
Contributor
Contributor
Jump to solution

How do you copy out the vmdk file after vmfs-fuse mount with ddrescue?

0 Kudos
continuum
Immortal
Immortal
Jump to solution

using the MOA-iso from vm-sickbay.com

sudo su
mkdir /esxi

mkdir /vmfs-out

mkdir /vmfs-in

vmfs-fuse <DEVICE> /vmfs-in

sshfs -o ro root@esxi1:/ /esxi

sshfs root@esxi2:/vmfs/volumes/datastoreRecoveryOUT /vmfs-out

cd /vmfs-out

mkdir out

ddrescue /vmfs-in/directory/name-flat.vmdk  out/name-flat.vmdk out/name-flat.vmdk.log

DEVICE maybe something /dev/sdc1 or /esxi/dev/disks/naa.*:1

esxi1 maybe the same host as esxi2 - but can also be another host

By the way ....
With ESXi 5.5 or 6 I noticed that I only rarely can use vmfs-fuse.
Most of the times it boils down to using scripts with
a bunch of dd-commands per flat.vmdk

Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

es3t
Contributor
Contributor
Jump to solution

sshfs -o ro root@esxi1:/ /esxi

sshfs root@esxi2:/vmfs/volumes/datastoreRecoveryOUT /vmfs-out

I presume that this above is when you want to 'chroot' to enviroment?

I was trying to ddrescue file from datastore to another datastore and it failed. By the way when I used vmfs-fuse to mount datastore which was the outfile (destination path) it was mountend only RO and I i couldn't save the file into it.

I made
1.  vmfs-fuse /dev/sda1 (broken driver with important vmdk file) /mnt/in

2. vmfs-fuse /dev/sdb1 (destination datastore) /mnt/out

after that ddrescue /mnt/in/file.vmdk /mnt/out/file.vmdk


And received error I/O


So now trying to recover but with -d and -r1 as raw file if it works I will be able to mount the raw file in loop to pull the vmdk file.

So Basically  pulling file to file doesn't work all the time.


Update:


After the ddrescue finished It just cloned the disk to file with the same error. So when I tried to fsck.vmfs datastore.raw file I get lots


Pointer Block 0x20025843 is lost.

Pointer Block 0x30025843 is lost.

Pointer Block 0x40025843 is lost.

Pointer Block 0x50025843 is lost.

Pointer Block 0x60025843 is lost.

with

File Block 0x02c6f4c1 is lost.

File Block 0x02c6f501 is lost.

File Block 0x02c6f541 is lost.

File Block 0x02c6f581 is lost.

File Block 0x02c6f5c1 is lost.

File Block 0x02c6f601 is lost.

And when I mount that file with

vmfs-fuse

It gone with no problem but

When I wanted to rsync vmdk file from that clone image to another location


I get in 99%


receiving incremental file list

./

file_3-flat.vmdk

536,763,109,088  99%   53.57MB/s    0:00:01  ad

rsync: read errors mapping "/mnt/restore/file/file_3-flat.vmdk": Input/output error (5)

536,870,912,000  99%   50.27MB/s    2:49:45 (xfr#1, to-chk=1/3)

WARNING: file_3-flat.vmdk failed verification -- update discarded (will try again).

file_3.vmdk

536,870,912,502 100%   50.26MB/s    2:49:46 (xfr#2, to-chk=0/3)

0 Kudos
continuum
Immortal
Immortal
Jump to solution

😆 I presume that this above is when you want to 'chroot' to enviroment?

Not sure what you mean by that ...
I usually work remotely - I ask my customer to create a 64-bit Linux VM that has network access to the esxi network.
Then I can do my work even when the esxi-host is still in production.
And I also use vmfs-fuse really rarely these days - the more terrabyte-size vmdks you work with the less useful it is.
I also noticed that is very useful to access VMFS-volumes with readonly mode - helps to workaround I/O-errors or stale locks.
Writing to VMFS via vmfs-fuse is something I never do - instead I just mount one directory of a VMFS-volume in writeable mode via sshfs.
This way I never get complaints about bad writes from a dangerously out-dated vmfs-fuse tool.
Only advantage of using vmfs-fuse and directly booting the affected esxi-host into Linux is the higher read-rate that you can get that way.

> So when I tried to fsck.vmfs datastore.raw file I get lots


Pointer Block 0x20025843 is lost.

Pointer Block 0x30025843 is lost.

Pointer Block 0x40025843 is lost.

Pointer Block 0x50025843 is lost.

Which vmfs-fuse version do you use ?
Can you read flat-vmdks larger than 256gb ?
If not you can expect errors like that

> So Basically  pulling file to file doesn't work all the time.

Of course not.
The key factor for recovery is the corruption-rate of the vmfs-metadata.

vmfs-fuse only helps with lightly damaged VMFS-Metadata.
If the hidden sf-files in the first GB of the volume are damaged, overformatted or missing vmfs-fuse does not help at all


Ulli



________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

es3t
Contributor
Contributor
Jump to solution

Which vmfs-fuse version do you use ?

Can you read flat-vmdks larger than 256gb ?

If not you can expect errors like that

I Don't know, but I downloaded the latest MON-iso so it should be the latest.

I also noticed that is very useful to access VMFS-volumes with readonly mode - helps to workaround I/O-errors or stale locks.

Writing to VMFS via vmfs-fuse is something I never do - instead I just mount one directory of a VMFS-volume in writeable mode via sshfs.

This way I never get complaints about bad writes from a dangerously out-dated vmfs-fuse tool.

Only advantage of using vmfs-fuse and directly booting the affected esxi-host into Linux is the higher read-rate that you can get that way.

Oh ok so thats why you use sshfs .. I didn't know what it was till now.

I bought new disk 2TB and put into the server.

The Damaged one has vmdk file. Thos file is an image of /dev/sdd disk in the VM. It's added to LVM in VolumeGroup.

ddrescuce to image didn't work and I was furious. The last Idea (beside yours) was try to run the VM again with faulty disk. Maybe it gone error only one time but maybe it can run some time with it. Guess what, The VM started. So it's good. I decide to make some operations inside the GuestOS luckily, like I said, it's LVM. I added new disk from the new disk to VM with 550GB storage amount. Now In GuestOS I added this disk to Physical Volume and extend the VolumeGroup. After That I ran pvmove from the /dev/sdd to new disk /dev/sde (about 400GB to move) it's running slow but it works like RAID 1 - sync. So I think it will work. After that I will be able to remove the /dev/sdd from VolumeGroup and then from PV and finally disconnect that disk from Server.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

> I Don't know, but I downloaded the latest MON-iso so it should be the latest.
I dont know MON-isos - if you mean the latest MOA-iso from  vm-sickbay.com - that iso has the latest official Ubuntu-build vmfs-tools which is NOT good enough for working with VMFS from ESXi 5.5 or later.
Call me via skype - dont want to post how to use unoffical builds of vmfs-tools in public.
The risk of damaging VMFS-volumes using the writeable flag is too big for my taste.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
PieterHezemans
Contributor
Contributor
Jump to solution

I have a similar problem with ESXI 5. Caused by several things.

It started with failure of the UPS which caused a HDD failure of a RAID1 configuration and it damaged my external attached backup device.

Before I received a spare HDD to rebuild the RAID1 The Ups failed again and corrupted somehow the last good working HDD. Which resulted that the ESXI5 doesn't see VMFS store anymore.

Is there any change to retrieve the VM's which where stored on the missing VMFS volume?

0 Kudos
mshayan
Contributor
Contributor
Jump to solution

Dear support

I have the same problem. invalid  magic number 0x00000000 VMFS. please help

0 Kudos