VMware Cloud Community
ic2018
Contributor
Contributor

eSXI VMFS-6 Datastore Corruption After Host Reboot

Hey guys, hopefully somebody can help with this or point me in the right direction. I lost one of my datastores after rebooting an eSXI 6.7.0 host (VMs were shut down and host was in maintenance mode), and it no longer shows up in the storage/datastore tab of esxi.

However, the VMFS partition is still displayed when viewing the storage device structure. VOMA shows output as below, I would assume the ON-DISK ERROR is the culprit. Manually mounting the uuid doesn't work, and VOMA doesn't have a fix option for VMFS-6 yet, so I'm not sure where to go from here. Hopefully someone can point me in the right direction, thanks in advance.

Phase 1: Checking VMFS header and resource files

   Detected VMFS-6 file system (labeled:'Primary') with UUID:5b0440a2-7dbb4c4b-de69-a0369fe03066, Version 6:82

   Found stale lock [type 10c00003 offset 286449664 v 2, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 37

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00003 offset 15070576640 v 2, hb offset 3833856

         gen 103, mode 1, owner 5bab9ade-3cf65242-a144-a0369fe03066 mtime 429

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00008 offset 16195584 v 6, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 81

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00002 offset 9928704 v 6, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 35

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00002 offset 16392192 v 6, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 29

         num 0 gblnum 0 gblgen 0 gblbrk 0]

  Cluster 785 unmap lock set while no pending unmaps, stale lock

ON-DISK ERROR: Cluster 785 free locked for unmap 457 should be 224

   Found stale lock [type 10c00002 offset 16465920 v 4, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 32

         num 0 gblnum 0 gblgen 0 gblbrk 0]

Phase 2: Checking VMFS heartbeat region

Marking Journal addr (14, 0) in use

Phase 3: Checking all file descriptors.

Phase 4: Checking pathname and connectivity.

Phase 5: Checking resource reference counts.

Total Errors Found:           1

Also the vmkernel log also shows this warning several times

2018-09-26T17:13:18.685Z cpu2:2097320)WARNING: Vol3: 3102: Primary/5b0440a2-7dbb4c4b-de69-a0369fe03066: Invalid physDiskBlockSize 512

0 Kudos
9 Replies
continuum
Immortal
Immortal

Hello

have a look at Locked files with VMFS 6 | VM-Sickbay
If necessary create a VMFS header dump if you want me to have a closer look - see
Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

ic2018
Contributor
Contributor

I've made a header backup and uploaded it here and attached it. Replacing the heartbeat section with a clean one did not resolve the issue, this header dump is prior to overwriting the corrupted partition's heartbeat section. Thanks for your help so far.

Edit: also here's a new voma output

Checking if device is actively used by other hosts

Scanning for VMFS-6 host activity (4096 bytes/HB, 1024 HBs).

Running VMFS Checker version 2.1 in default mode

Initializing LVM metadata, Basic Checks will be done

Phase 1: Checking VMFS header and resource files

   Detected VMFS-6 file system (labeled:'Primary') with UUID:5b0440a2-7dbb4c4b-de69-a0369fe03066, Version 6:82

   Found stale lock [type 10c00003 offset 286449664 v 2, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 37

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00003 offset 15070576640 v 2, hb offset 3833856

         gen 103, mode 1, owner 5bab9ade-3cf65242-a144-a0369fe03066 mtime 429

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00008 offset 16195584 v 6, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 81

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00002 offset 9928704 v 6, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 35

         num 0 gblnum 0 gblgen 0 gblbrk 0]

   Found stale lock [type 10c00002 offset 16392192 v 6, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 29

         num 0 gblnum 0 gblgen 0 gblbrk 0]

  Cluster 785 unmap lock set while no pending unmaps, stale lock

ON-DISK ERROR: Cluster 785 free locked for unmap 457 should be 224

   Found stale lock [type 10c00002 offset 16465920 v 4, hb offset 3837952

         gen 1, mode 1, owner 5baba25d-063a88f4-62a5-a0369fe03066 mtime 32

         num 0 gblnum 0 gblgen 0 gblbrk 0]

Phase 2: Checking VMFS heartbeat region

Phase 3: Checking all file descriptors.

Phase 4: Checking pathname and connectivity.

Phase 5: Checking resource reference counts.

ON-DISK ERROR: JBC inconsistency found: (14,0) allocated in bitmap, but never used

Total Errors Found:           2

0 Kudos
continuum
Immortal
Immortal

Just downloaded the dump ...
This is a tough one ...
OSF-Windows-Server-2016 seems readable , OSF-CentOS-Plesk has a problem.
I will definetely need more time for this
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

ic2018
Contributor
Contributor

The Plesk VM is not entirely necessary I have a pretty recent complete backup of it

0 Kudos
continuum
Immortal
Immortal

Please run the command
dd if=/dev/disks/device bs=1M count=10 skip=278540 of=tmp/test.bin
device is the same as you used to create the vmfs-header dump
Download /tmp/test.bin
Compress the file and attach it to your next reply.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

ic2018
Contributor
Contributor

Here you go. Thanks again

0 Kudos
continuum
Immortal
Immortal

Please look at this partitiontable - is this the Windows-bootdisk you need ?
ic2018.png

If yes - install Anydesk and call me / send a message via skype.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
continuum
Immortal
Immortal

Please let me know if you are still interested.
The success rate of such operations is much better if there is no unnecessary delay between each steps ....


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
ic2018
Contributor
Contributor

Yes I am the partition table looks about right for the windows disk. I'll contact you on Skype shortly

0 Kudos