VMware Cloud Community
blueadept
Contributor
Contributor
Jump to solution

ESXi 6.5 - Cannot find/mount VMFS from a LUN that exists and seems to have all the data

Hopefully someone can help me with a fairly frustrating situation. I have done quite a few searches and read a number of articles here and elsewhere - but the situation doesn't seem to match precisely.

The problem arose with an "inelegant" reboot of my ESXi server. This is one of a few free installations using ScaleIO to map LUNs for the datastores. I was encountering what appeared to be a hung server, tried to perform a controlled reboot (which did not seem to take) followed by a power cycle of the server. When it came back up, the VM running on a local datastore came back fine, but the datastores on the ScaleIO LUNs were not showing up. Everything seems to be saying there is no filesystem - but this doesn't make sense for every LUN in question, and when I look at the disks with dd it seems to show data I would expect.

What I see:

* The LUNs themselves are discovered along with the partitions on them. When I look at the devices in the GUI or cli, I see them no problem.

[root@vm-host101:/var/log] ls /vmfs/devices/disks/eui*

/vmfs/devices/disks/eui.0f60934767ed6d2731f35a0d00000005

/vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000

/vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000:1

[root@vm-host101:/var/log] esxcli storage core device list | grep eui.

eui.0f60934767ed6d273d0a632d00000000

   Display Name: EMC Fibre Channel Disk (eui.0f60934767ed6d273d0a632d00000000)

   Devfs Path: /vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000

eui.0f60934767ed6d2731f35a0d00000005

   Display Name: EMC Fibre Channel Disk (eui.0f60934767ed6d2731f35a0d00000005)

   Devfs Path: /vmfs/devices/disks/eui.0f60934767ed6d2731f35a0d00000005

* partedUtil and coma shows the GPT partitions still there.

[root@vm-host101:/var/log] partedUtil getptbl /dev/disks/eui.0f60934767ed6d273d0

a632d00000000

gpt

35507 255 63 570425344

1 128 570425304 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

[root@vm-host101:/var/log] voma -m ptbl -f check -d /vmfs/devices/disks/eui.0f60

934767ed6d273d0a632d00000000

Running Partition table checker version 0.1 in check mode   

Phase 1: Checking device for valid primary GPT              

   Detected valid GPT signatures                            

   Number    Start          End                Type         

   1         128            570425304          vmfs         

                                                            

Found a valid partition table on the device

                                                            

Total Errors Found:           0

* esxcli shows no filesystems on those LUNs

[root@vm-host101:/var/log] esxcli storage filesystem list

Mount Point                                        Volume Name  UUID                                 Mounted  Type            Size          Free

-------------------------------------------------  -----------  -----------------------------------  -------  ------  ------------  ------------

/vmfs/volumes/598ccaea-8afb9c77-80a4-001517d9a462  datastore1   598ccaea-8afb9c77-80a4-001517d9a462     true  VMFS-6  241055039488  239537750016

/vmfs/volumes/dcaf3470-159653b7-59a1-55193a835180               dcaf3470-159653b7-59a1-55193a835180     true  vfat       261853184     110673920

/vmfs/volumes/2044b777-2642e6ce-6fc4-7407f0f24801               2044b777-2642e6ce-6fc4-7407f0f24801     true  vfat       261853184     110866432

/vmfs/volumes/598ccaf4-06170a72-d2b8-001517d9a462               598ccaf4-06170a72-d2b8-001517d9a462     true  vfat      4293591040    4285333504

/vmfs/volumes/598ccadb-9d7c923c-6aa1-001517d9a462               598ccadb-9d7c923c-6aa1-001517d9a462     true  vfat       299712512      83927040

* esxcfg-volume shows no snapshots

[root@vm-host101:/var/log] esxcfg-volume -l

[root@vm-host101:/var/log]

* voma similarly shows an issue with the file system

[root@vm-host101:/var/log] voma -m vmfs -f check -d /vmfs/devices/disks/eui.0f60

934767ed6d273d0a632d00000000:1

Checking if device is actively used by other hosts          

Running VMFS Checker version 2.1 in check mode              

Initializing LVM metadata, Basic Checks will be done        

Initializing LVM metadata..-                                

LVM magic not found at expected Offset,

It might take long time to search in rest of the disk.

VMware ESX Question:

Do you want to continue (Y/N)?

0) _Yes

1) _No

Select a number from 0-1: 0

         ERROR: LVM Major or Minor version Mismatch, Not supported

         ERROR: Failed to Initialize LVM Metadata           

   VOMA failed to check device : Not Supported              

                                                            

Total Errors Found:           0

   Kindly Consult VMware Support for further assistance

* dd shows data exists and even shows the datastore label I would expect.

[root@vm-host101:/var/log] dd if=/vmfs/devices/disks/eui.0f60934767ed6d273d0a632

d00000000:1  | od -c | head

0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

*

114000000   ^ 0N1 0E3   / 030  \0  \0  \0   Q   e   S 0J5   X   1  \0 0O1

114000020 0D4     0K4 0J0   P 0C1 0H1   i 0O1 026  \0  \0  \0   V   M   -

114000040   W   i   n   d   o   w   s   -   7   -   x   6   4   -   P   r

114000060   o  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

114000100  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

*

114000220  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 002  \0

114000240  \0  \0  \0 020  \0  \0  \0  \0  \0   e   S 0J5   X 001  \0  \0

I had once seen what I would have thought to be a similar issue, which I rectified by mapping the LUN to a different server. But in this case this hasn't worked. I even built a new server and added it to the ScaleIO network and I still cannot see the filesystem.

In a "typical" Unix environment I would expect I could do an fsck, but I'm not seeing such an opportunity here.

Any help would be greatly appreciated.

Reply
0 Kudos
1 Solution

Accepted Solutions
continuum
Immortal
Immortal
Jump to solution

> In a "typical" Unix environment I would expect I could do an fsck, but I'm not seeing such an opportunity here.
The concept of having a tool like fsck or chkdsk is so old school :smileylaugh:

If you have customers that believe  buying redundant hardware plus additional software licenses is way cooler than commandline repair work it would be contraproductive to supply a solid fsck-tool. And it actually works out: companies that think as big as VMware suggests do not need recovery.
Serious now: read Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
Create a dump like I explained above and provide a download link.
In most of the cases I can then give you a solid recovery prognosis in about an hour.
Contact me via skype before you send any data.
If creating a dump fails due to I/O errors let me know then I will create one myself.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

Reply
0 Kudos
4 Replies
a_p_
Leadership
Leadership
Jump to solution

Try to contact continuum via Skype (details in his profile), he might be able to help.

André

Reply
0 Kudos
continuum
Immortal
Immortal
Jump to solution

> In a "typical" Unix environment I would expect I could do an fsck, but I'm not seeing such an opportunity here.
The concept of having a tool like fsck or chkdsk is so old school :smileylaugh:

If you have customers that believe  buying redundant hardware plus additional software licenses is way cooler than commandline repair work it would be contraproductive to supply a solid fsck-tool. And it actually works out: companies that think as big as VMware suggests do not need recovery.
Serious now: read Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
Create a dump like I explained above and provide a download link.
In most of the cases I can then give you a solid recovery prognosis in about an hour.
Contact me via skype before you send any data.
If creating a dump fails due to I/O errors let me know then I will create one myself.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
Finikiez
Champion
Champion
Jump to solution

According to the outputs VMFS LVM is corrupted. Can you fix this?

This could happen dut to storage outage, firmware issue or this storage array was powered down and unpluged from the power with expired\broken batteries.

Reply
0 Kudos
continuum
Immortal
Immortal
Jump to solution

> Can you fix this?
I do not see any sense in attempts to fix a VMFS that misbehaves after a power failure. To my customers I only recommend to evacuate the datastore ASAP, wipe the LUN and build a new volume from scratch.
The question here is wether the VMFS metadata still allows to extract files and I can tell you more after I have seen the dump.
Rule of thumb: the older the VMFS-version and the more thick provisioning was used the better the chances.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos