4 Replies Latest reply on Aug 11, 2017 5:59 PM by continuum

    ESXi 6.5 - Cannot find/mount VMFS from a LUN that exists and seems to have all the data

    blueadept Lurker

      Hopefully someone can help me with a fairly frustrating situation. I have done quite a few searches and read a number of articles here and elsewhere - but the situation doesn't seem to match precisely.

       

      The problem arose with an "inelegant" reboot of my ESXi server. This is one of a few free installations using ScaleIO to map LUNs for the datastores. I was encountering what appeared to be a hung server, tried to perform a controlled reboot (which did not seem to take) followed by a power cycle of the server. When it came back up, the VM running on a local datastore came back fine, but the datastores on the ScaleIO LUNs were not showing up. Everything seems to be saying there is no filesystem - but this doesn't make sense for every LUN in question, and when I look at the disks with dd it seems to show data I would expect.

       

      What I see:

      * The LUNs themselves are discovered along with the partitions on them. When I look at the devices in the GUI or cli, I see them no problem.

       

      [root@vm-host101:/var/log] ls /vmfs/devices/disks/eui*

      /vmfs/devices/disks/eui.0f60934767ed6d2731f35a0d00000005

      /vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000

      /vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000:1

       

      [root@vm-host101:/var/log] esxcli storage core device list | grep eui.

      eui.0f60934767ed6d273d0a632d00000000

         Display Name: EMC Fibre Channel Disk (eui.0f60934767ed6d273d0a632d00000000)

         Devfs Path: /vmfs/devices/disks/eui.0f60934767ed6d273d0a632d00000000

      eui.0f60934767ed6d2731f35a0d00000005

         Display Name: EMC Fibre Channel Disk (eui.0f60934767ed6d2731f35a0d00000005)

         Devfs Path: /vmfs/devices/disks/eui.0f60934767ed6d2731f35a0d00000005

       

      * partedUtil and coma shows the GPT partitions still there.

       

      [root@vm-host101:/var/log] partedUtil getptbl /dev/disks/eui.0f60934767ed6d273d0

      a632d00000000

      gpt

      35507 255 63 570425344

      1 128 570425304 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

      [root@vm-host101:/var/log] voma -m ptbl -f check -d /vmfs/devices/disks/eui.0f60

      934767ed6d273d0a632d00000000

      Running Partition table checker version 0.1 in check mode   

      Phase 1: Checking device for valid primary GPT              

         Detected valid GPT signatures                            

         Number    Start          End                Type         

         1         128            570425304          vmfs         

                                                                  

      Found a valid partition table on the device

                                                                  

      Total Errors Found:           0

       

      * esxcli shows no filesystems on those LUNs

       

      [root@vm-host101:/var/log] esxcli storage filesystem list

      Mount Point                                        Volume Name  UUID                                 Mounted  Type            Size          Free

      -------------------------------------------------  -----------  -----------------------------------  -------  ------  ------------  ------------

      /vmfs/volumes/598ccaea-8afb9c77-80a4-001517d9a462  datastore1   598ccaea-8afb9c77-80a4-001517d9a462     true  VMFS-6  241055039488  239537750016

      /vmfs/volumes/dcaf3470-159653b7-59a1-55193a835180               dcaf3470-159653b7-59a1-55193a835180     true  vfat       261853184     110673920

      /vmfs/volumes/2044b777-2642e6ce-6fc4-7407f0f24801               2044b777-2642e6ce-6fc4-7407f0f24801     true  vfat       261853184     110866432

      /vmfs/volumes/598ccaf4-06170a72-d2b8-001517d9a462               598ccaf4-06170a72-d2b8-001517d9a462     true  vfat      4293591040    4285333504

      /vmfs/volumes/598ccadb-9d7c923c-6aa1-001517d9a462               598ccadb-9d7c923c-6aa1-001517d9a462     true  vfat       299712512      83927040

       

       

      * esxcfg-volume shows no snapshots

       

      [root@vm-host101:/var/log] esxcfg-volume -l

      [root@vm-host101:/var/log]

       

      * voma similarly shows an issue with the file system

       

      [root@vm-host101:/var/log] voma -m vmfs -f check -d /vmfs/devices/disks/eui.0f60

      934767ed6d273d0a632d00000000:1

      Checking if device is actively used by other hosts          

      Running VMFS Checker version 2.1 in check mode              

      Initializing LVM metadata, Basic Checks will be done        

      Initializing LVM metadata..-                                

      LVM magic not found at expected Offset,

      It might take long time to search in rest of the disk.

       

      VMware ESX Question:

      Do you want to continue (Y/N)?

       

      0) _Yes

      1) _No

       

      Select a number from 0-1: 0

       

               ERROR: LVM Major or Minor version Mismatch, Not supported

               ERROR: Failed to Initialize LVM Metadata           

         VOMA failed to check device : Not Supported              

                                                                  

      Total Errors Found:           0

         Kindly Consult VMware Support for further assistance

       

      * dd shows data exists and even shows the datastore label I would expect.

       

      [root@vm-host101:/var/log] dd if=/vmfs/devices/disks/eui.0f60934767ed6d273d0a632

      d00000000:1  | od -c | head

      0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

      *

      114000000   ^ 0N1 0E3   / 030  \0  \0  \0   Q   e   S 0J5   X   1  \0 0O1

      114000020 0D4     0K4 0J0   P 0C1 0H1   i 0O1 026  \0  \0  \0   V   M   -

      114000040   W   i   n   d   o   w   s   -   7   -   x   6   4   -   P   r

      114000060   o  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

      114000100  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

      *

      114000220  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 002  \0

      114000240  \0  \0  \0 020  \0  \0  \0  \0  \0   e   S 0J5   X 001  \0  \0

       

      I had once seen what I would have thought to be a similar issue, which I rectified by mapping the LUN to a different server. But in this case this hasn't worked. I even built a new server and added it to the ScaleIO network and I still cannot see the filesystem.

       

      In a "typical" Unix environment I would expect I could do an fsck, but I'm not seeing such an opportunity here.

       

      Any help would be greatly appreciated.