VMware Cloud Community
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

VMFS partition table corruption? Need help to recover data

My home ESXi 5.1 was installed on HP Microserver G7. Hypervisor on USB-flash, datastore on Adaptec 3405.

After some electro-power problem Microserver is refuse to start and i choose to build new server on Microserver Gen8. I moved Adaptec and disks in new box, installed new ESXi 5.5 U1 (HP image) on USB flash and after reboot didn`t see datastore :-(.

After some investigation i found that ESXi is not see any partition table on this raid array:


partedUtil getptbl /vmfs/devices/disks/mpx.vmhba2:C0:T0:L0

unknown

728422 255 63 11702108160

BUT! I tried to boot server from linux liveCD with vmfs-tools 0.2.5 installed, mounted partition with vmfs-fuse and able to see the data! But cannot copy all important vmdk`s (see below).

debugvmfs is also show me some useful information:

Volume Version: 14

Version: 54

Label: STORE

Mode: public

UUID: 4f2ac538-58c181cf-aeff-441ea13ee615

Creation time: 2012-02-02 23:17:44

Block size: 1 MiB

Subblock size: 8 KiB

FDC Header size: 64 KiB

FDC Bitmap count: 64

and this is very strange

debugvmfs /dev/sda1 show lvm.extent[0]

Device: /dev/sda1

UUID: 4f2ac4da-8de4b166-848a-441ea13ee615

LUN: 0

Version: 5

Name:

Size: 459.99 GiB

Num. Segments: 22319

First Segment: 0

Last Segment: 22318


because i`m sure that i didn`t create any extents.


Also, there is two vmdk files on datastore larger that 256GB and can`t copy it because of vmdk-tools limitation :-(.


I tried to do some magic (like this - partedUtil setptbl  "/vmfs/devices/disks/mpx.vmhba2:C0:T0:L0" gpt "1 2048 11702099429 AA31E02A400F11DB9590000C2911D1B8 0") but without any success ;-(.


Any suggestions, guys?

0 Kudos
1 Solution

Accepted Solutions
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

Finally!

A found that was a "Intel VT-d" enabled in BIOS. After disable it everything is cool.

Seems like Adaptec 3405 works bad with it. Will try to change it to P222.

View solution in original post

0 Kudos
7 Replies
continuum
Immortal
Immortal
Jump to solution

Nothing strange here - /dev/sda1 is the first and only extent for that datastore. You cant create a datastore without an extent ...

Offset 2048 is the most probable value - as long as you use sectors as display unit.
Open question left is the size - where did you found the value 11702099429 ?


PartedUtil has a function to display last usable sector - what does that say ?

Read esxi kernel log - there may be complaints about a bad size of the volume - thats what you want to know next

> Also, there is two vmdk files on datastore larger that 256GB and can`t copy it because of vmdk-tools limitation :-(.
Please explain ..






________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

It`s strange because /dev/sda1 - it`s a big volume about 5.7TB in size (4x2TB HDD in RAID5), and df on mounted partition shows full size.


11702099429 - i calculated it from formula (C * H * S) -1728422 * 255 * 63 - 1. Is this wrong?

When i try to:

partedUtil getUsableSectors /vmfs/devices/disks/mpx.vmhba2:C0:T0:L0

I get:

Unknown partition table on disk /vmfs/devices/disks/mpx.vmhba2:C0:T0:L0

vmkernel.log says this, when i`m try to vmkfstools -V:

2014-04-24T09:07:20.930Z cpu4:258177)Partition: 857: MBR table with partition type '0xee' detected on mpx.vmhba2:C0:T0:L0. However, it does not have a valid GPT table. Is this a valid MBR disk?

2014-04-24T09:07:21.018Z cpu4:258177)Vol3: 714: Couldn't read volume header from control: Not supported

2014-04-24T09:07:21.018Z cpu4:258177)FSS: 5092: No FS driver claimed device 'control': Not supported

About vmfs-tools limits -  here is almost my situation. I have two flat-vmdk on datastore, each 2TB in size. I cannot do anything with both - "input\output error", but smaller vmdk (25GB, 40GB, etc) are ok.

Is there any other tools (paid or free) to rescue data? It`s just a home server with photos, videos and like that (and i didn`t have another 6TB storage for backups), but my wife will kill me - there was a lot of family data :-(.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

OK - the GPT-table is not ok -maybe it first has to be wiped blank to clean up.

I think I can help with the input/output errors - call me via skype "sanbarrow" and I have a look


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

Dear continuum, your investigation give me some idea and i check it out.

I booted Microserver from your moa64dvd.iso, export /dev/sda over iscsi, connect to this target from temporary ESXi (nested, to be clean, running in VM on my laptop) and do some testing.

Datastore is still not able to mount, but some things has changed:


voma:

voma -m vmfs -d /vmfs/devices/disks/t10.9454450000000000EE8FCFE0BB7DC8AD5ECB14B22BC42643:1

Checking if device is actively used by other hosts

Running VMFS Checker version 1.0 in default mode

Initializing LVM metadata, Basic Checks will be done

ON-DISK ERROR: Invalid device Size 5991473859584, should be 5991472811520

Phase 1: Checking VMFS header and resource files

   Detected VMFS file system (labeled:'STORE') with UUID:4f2ac538-58c181cf-aeff-441ea13ee615, Version 5:54

Phase 2: Checking VMFS heartbeat region

Phase 3: Checking all file descriptors.

   <FD c479 r13> : Wrong Epoch Block Count 2 should be 4

   <FD c479 r107> : Wrong Epoch Block Count 275590 should be 1629684

   <FD c479 r124> : Wrong Epoch Block Count 65 should be 23019

   <FD c479 r183> : Wrong Epoch Block Count 15495 should be 34798

Phase 4: Checking pathname and connectivity.

Phase 5: Checking resource reference counts.

Total Errors Found:           1


vmkernel.log:

WARNING: LVM: 2900: [t10.9454450000000000EE8FCFE0BB7DC8AD5ECB14B22BC42643:1] Device shrank (actual size 11702095335 blocks, stored size 11702097382 blocks)

partedUtil:

partedUtil getptbl /vmfs/devices/disks/t10.9454450000000000EE8FCFE0BB7DC8AD5ECB14B22BC42643

gpt

728422 255 63 11702108160

1 2048 11702097382 AA31E02A400F11DB9590000C2911D1B8 vmfs 0


hexdump:

hexdump -C /vmfs/devices/disks/t10.9454450000000000EE8FCFE0BB7DC8AD5ECB14B22BC42643 | grep -m 1 "0d d0 01 c0"

00200000  0d d0 01 c0 05 00 00 00  15 00 00 00 00 0a 00 00  |................|


This mean i need to use actual size 11702095335 blocks in partedUtil setptbl command? Or it can be dangerous?

0 Kudos
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

Yahooo!!!!!
VMKB2046610 was very helpful. After that datastore is still not mounted, "Device t10.9454450000000000EE8FCFE0BB7DC8AD5ECB14B22BC42643:1 detected to be a snapshot", but vmkb1011387 help me.

Now i have access to vmdk, will try to recover data and recreate VMFS on raid-array.

Thank you, sanbarrow, for your help!

0 Kudos
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

As I mentioned earlier in the skype-chat with sanbarrow, it seems like under ESXi device /vmfs/devices/disks/mpx.vmhba2:C0:T0:L0 outputs some contents of ... memory or temporary file? ... instead of proper partition table (under linux everything is ok).

I tried to load ESXi again and execute several times hexdump -C  /vmfs/devices/disks/mpx.vmhba2:C0:T0:L0 | less - each time the output were different! Some part of ELF-files, config files, etc.

I try pure ESXi 5.5 (without customization), install new Adaptec drivers - same result, except the device was mpx.vmhba1:C0:T0:L0.
How is it possible? And how i can prevent it?
0 Kudos
3apa3a_b_ta3e
Enthusiast
Enthusiast
Jump to solution

Finally!

A found that was a "Intel VT-d" enabled in BIOS. After disable it everything is cool.

Seems like Adaptec 3405 works bad with it. Will try to change it to P222.

0 Kudos