Create a VMFS-Header-dump using an ESXi-Host in production

1 Kudo

How to create a VMFS-Header-dump using an ESXi-Host in production

Why should I do this ?

- you accidentaly deleted a VM or a VMDK from a VMFS-datastore and want to ask an expert wether the VM or VMDK can be recovered.

- after a power-failure your datastore appears to be wiped blank (no VMs and directories are listed anymore)

- after a RAID-rebuild a datastore can no longer be mounted

- the physical disk or raid array has lost its partition-table

A VMFS header-dump may be requested in the forum when you ask for help while troubleshooting corruption of a datastore.

It will also help when important VMDKs are locked or complain about I/O errors.

Is this procedure ideal for best results ?

No - especially if you use a VMFS-volume in a cluster and more than one host have access to it this procedure is not optimal.

But the results are good enough in most cases and especially if the affected datastore is used in active production you do not want to disconnect all hosts and unmount the datastore first.

Is this procedure safe - and does it affect production ?

Yes - creating a dump does not do any harm if you store the dump in /tmp or an unaffected datastore.

What is contained in such a header-dump ?

A dump like described here contains the hidden .sf files that are usually located in the first 2 gb of a datastore.

Without the data stored in this area a large vmdk file would be just a large pile of fragments.

1. Required: root-access to an ESXi-host via ssh
2. Identify the device that corresponds to the affected datastore:

login with root account
cd /dev/disks
ls -lisa | grep -v vml

In many cases you can identify the correct device by inspecting the referenced filesize – typically several hundred of GBs or several TBs.
If several datastores have the same size – use
esxcfg-scsidevs -m
for a more detailed description of the available devices.

To create a dump you need to know the Device and the partNum

So if you figured out that the corrupted datastore appears in /dev/disks as

naa.1234567812345678:1 (just an example)

then Device is naa.1234567812345678 and partNum is 1

For all VMFS-versions the procedure is the same - but note that the size of the dump is different.

Case A: you have another unaffected datastore that can be used to store the dump.

In this case you can store the dump in this location:

/vmfs/volumes/ANOTHER-UNAFFECTED-DATASTORE/

VMFS 3 and 5

dd if=/dev/disks/Device:partNum bs=1M count=1500 of=/vmfs/volumes/ANOTHER-UNAFFECTED-DATASTORE/vmfs-header-dump.1500

VMFS 6

dd if=/dev/disks/Device:partNum bs=1M count=2000 of=/vmfs/volumes/ANOTHER-UNAFFECTED-DATASTORE/vmfs-header-dump.2000

VMFS 6 used by ESXi 7

dd if=/dev/disks/Device:partNum bs=1M count=2500 of=/vmfs/volumes/ANOTHER-UNAFFECTED-DATASTORE/vmfs-header-dump.2500

Case B: you do NOT have another unaffected datastore and have to use /tmp

Carefully watch the commandline - if you see a message "short write ..." then the dump is incomplete - use the procedure in case C

VMFS 3 and 5

dd if=/dev/disks/Device:partNum bs=1M count=1500 | gzip -c > /tmp/vmfs-header-dump.1500.gz

VMFS 6

dd if=/dev/disks/Device:partNum bs=1M count=2000 | gzip -c > /tmp/vmfs-header-dump.2000.gz

VMFS 6 used by ESXi 7

dd if=/dev/disks/Device:partNum bs=1M count=2500 | gzip -c > /tmp/vmfs-header-dump.2500.gz

Case C: you do NOT have another unaffected datastore and only very little free space /tmp

In this case you have to split the dump in several pieces so that each one of them fits into /tmp.

After each command use WinSCP or any other SCP-client and download each piece after you created it.

Once you downloaded the dump-part , clean up /tmp and run the next command.

dd if=/dev/disks/Device:partNum bs=1M count=500 skip=0 | gzip -c > /tmp/split-vmfs-header-dump.0.gz

download /tmp/split-vmfs-header-dump.0.gz and clean up /tmp

dd if=/dev/disks/Device:partNum bs=1M count=500 skip=500 | gzip -c > /tmp/split-vmfs-header-dump.500.gz

download /tmp/split-vmfs-header-dump.500.gz and clean up /tmp

dd if=/dev/disks/Device:partNum bs=1M count=500 skip=1000 | gzip -c > /tmp/split-vmfs-header-dump.1000.gz

download /tmp/split-vmfs-header-dump.1000.gz and clean up /tmp

dd if=/dev/disks/Device:partNum bs=1M count=500 skip=1500 | gzip -c > /tmp/split-vmfs-header-dump.1500.gz

download /tmp/split-vmfs-header-dump.1500.gz and clean up /tmp

dd if=/dev/disks/Device:partNum bs=1M count=500 skip=2000 | gzip -c > /tmp/split-vmfs-header-dump.2000.gz

download /tmp/split-vmfs-header-dump.2000.gz and clean up /tmp

For VMFS 3 and 5 you need the first 3 parts

For VMFS 6 you need the first 4 parts

For VMFS 6 used by ESXi 7 you need to run 5 commands.

For all VMFS versions also dump the first MB of the device which contains the MBR or GPT partitiontable.

dd if=/dev/disks/Device bs=1M count=1 skip=0 of=/tmp/mbr-gpt.bin

If you had to use gzip while creating the dump unpack the gz on your admin host and verify the size of the dump.

A one piece dump should have a size of 1500mb for VMFS 5 or 3, 2000mb for VMFS 6 and 2500mb for VMFS6 / ESXi 7

All parts of a split dump should have a size of 500mb

If you want to send the dump you just collected to someone to look into it please also create a readme.txt with some basic information:

- short summary of the history of the datastore

- short summary of the eventsthat caused the corruption

- if you accidentaly deleted a vmdk - please add the size and guestOS

- your contact information

Then create an empty directory with a good name like "yourName-datastore-name" and compress the complete directory with an effective packer like 7zip or Rar

The directory should contain:

VMFS 5 and earlier:

mbr-gpt.bin

vmfs-header-dump.1500 (or 3 files split-vmfs-header-dump.0 to split-vmfs-header-dump.1000 )

readme.txt

VMFS6:

mbr-gpt.bin

vmfs-header-dump.2000 (or 4 files split-vmfs-header-dump.0 to split-vmfs-header-dump.1500 )

readme.txt

VMFS6 / ESXi 7:

mbr-gpt.bin

vmfs-header-dump.2500 (or 5 files split-vmfs-header-dump.0 to split-vmfs-header-dump.2000 )

readme.txt

Typical size of a VMFS header dump compressed with 7zip or Rar is something between 10Mb and 800Mb.

You may want to check wether the dump contains any confidential data that you are not allowed to share.

To evaluate which data is contained in a VMFS header dump download the tool strings.exe from

https://technet.microsoft.com/en-us/sysinternals/strings.aspx

after download unzip strings.exe and copy it to the same path that already has replace with your name.1536

Open a cmd-box and execute

strings.exe dump-file > strings.txt

Search through strings.txt.
The dump contains vmx-files and log-files which may contain client names and other sensitive data.

There is a Knowledgebase-article that discusses the same topic – see
https://kb.vmware.com/kb/1020645

Unfortunately that KB is outdated and has not been edited to be useful for VMFS 6 used by ESXi 7

Disclaimer: the procedure for VMFS 6 used by ESXi 7 should work in most cases.

But as ESXi 7 does not write all metadata as a single block to the start of the volume this procedure may not be enough.

In some cases it will be necessary to also add a copy of the hidden file .sbc.sf

Ulli Hankeln

#########################################################################
contact:
skype : sanbarrow

#########################################################################

continuum · ‎04-26-2022

Important:
Header-dumps for VMFS 6 should be at least 3000 MB in size.
The 2000 MB I recommended previously are just too small too often.

DO NOT USE RAR PLEASE

ALWAYS INCLUDE THE MBR/GPT FILE

Document the command you used to create the dump.