VMware Cloud Community
PSimpson2
Contributor
Contributor

Missing datastore

Hi All,

First post here but I am tearing my hair out trying to sort this out.

A server I look after suffered a failure on one of the drives in a RAID 6 array.

The first we knew of it (server is at a remote site) was the VM's would not start and they both show the .vmx files inaccessible they are stored on a datastore called Data_Store

Its a local disk on Esxi Server

vsphere_config_inaccessible.png

Replaced the failed drive in the array and then let it rebuild, but  still the VM's failed to load.
They still showed the same as in the image above.

I checked the Storage adapters and the disk with the datastore is there and mounted

disks.png

I trolled the net and Vmware communities and tried alot of different things like doing Re-Scans and various esxcli commands show the datastore is there but none of them have gotten back my data store.

I tried Add Storage from the configuration tab

It can see the disk...but the only option is format not keep the existing data

11-02-2016 2-48-30 PM.png

11-02-2016 2-49-22 PM.png

Here are some more screen shots showing the volume and such.

11-02-2016 2-42-14 PM.png

11-02-2016 2-45-00 PM.png

11-02-2016 2-51-09 PM.png

I am really sorry for the long winded post and images but I am extremely frustrated and have had this dumped in my lap.
If anyone can help it would be greatly appreciated.

Cheers,

Paul

12 Replies
Nick_Andreev
Expert
Expert

If vSphere Client offers to format the datastore, that means it cannot see a VMFS partition on the presented LUN. This seems to be a case of a corrupted LUN. I would suggest to format the LUN and restore from backup.

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au
0 Kudos
MKguy
Virtuoso
Virtuoso

The device is still kind of there, but currently inaccessible, as indicated by the IO error. A reformat will probably not even work in this state.

Please check the /var/log/vmkernel.log, vmkwarning.log and syslog.log files for errors.

Also provide some more info with the following commands:

# esxcli storage filesystem list

# esxcli storage vmfs snapshot list

# esxcli storage core adapter list

# esxcli storage core device detached list

# esxcli storage core device partition list

# esxcli storage core path list -d [ID]

# esxcli storage nmp device list -d [ID]

# esxcli storage nmp satp generic deviceconfig get -d [ID]

# cat /proc/scsi/*/*

# vmkfstools --queryfs -h /vmfs/devices/disks/[ID]

# partedUtil get /vmfs/devices/disks/[ID]

# partedUtil getptbl /vmfs/devices/disks/[ID]

# partedUtil getUsableSectors /vmfs/devices/disks/[ID]

# fdisk -l /vmfs/devices/disks/[ID]

You could try to detach and re-attach the storage device (# esxcli storage core device set -d [ID] --state off/on)

Have you rebooted the server already as well? If you don't have backups and before you actively fiddle around too much, you should try to make an image backup of the entire volume with a Linux live CD or something (dd is your friend).

-- http://alpacapowered.wordpress.com
continuum
Immortal
Immortal

Hi
download this iso http://sanbarrow.com/livecds/moa64-nogui/MOA64-nogui-incl-src-111014-efi.iso

and call me if you are interested. In a few minutes I can then tell you if there is a chance or if you are screwed.
Ulli


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
thibaudpeter
Enthusiast
Enthusiast

Maybe the partition table is gone !!
Erro during a RAID reconstruction happen

What's the result of #ls -l /vmfs/devices/disks ?

0 Kudos
PSimpson2
Contributor
Contributor

This doesnt look good

2016-02-14T21:46:05.795Z cpu1:34051)ScsiDeviceIO: 2646: Cmd(0x439d809e3240) 0x28, CmdSN 0xd from world 34572 to dev "mpx.vmhba1:C0:T1:L0" failed H:0x0 D:0x2 P:0x0   Valid sense data: 0x4 0x44 0x0.

2016-02-14T21:46:05.801Z cpu0:34572 opID=f9b8cbdc)World: 15446: VC opID 63E9CDFF-00000138-88fb maps to vmkernel opID f9b8cbdc

2016-02-14T21:46:05.801Z cpu0:34572 opID=f9b8cbdc)FSS: 5327: No FS driver claimed device '550a3e0f-e5592108-fc82-00215e570140': No filesystem on the device

2016-02-14T21:46:05.801Z cpu0:34572 opID=f9b8cbdc)LVM: 13824: Failed to mount 550a3e0f-e5592108-fc82-00215e570140 (now umounting): Not supported

0 Kudos
hussainbte
Expert
Expert

Hi Paul,

As MkGuy said, share the partition layout of the identified naa.ID.

The fact that it recognizes there is a vmfs label, it is possible that the partition table is an issue.

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/
0 Kudos
PSimpson2
Contributor
Contributor

vmkfstools --queryfs -h /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

devfs-1.00 file system spanning 0 partitions.

File system label (if any):

Mode: private

Capacity 512 bytes, 512 bytes available, file block size 512 bytes, max supported file size 0 bytes

UUID: 00000000-00000000-0000-000000000000

Partitions spanned (on "notDCS"):

Is Native Snapshot Capable: NO

partedUtil getptbl /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

gpt

145839 255 63 2342903808

1 2048 2342903774 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

partedUtil get /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

145839 255 63 2342903808

1 2048 2342903774 0 0

partedUtil getUsableSectors /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

34 2342903774


fdisk -l /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

***

*** The fdisk command is deprecated: fdisk does not handle GPT partitions.  Please use partedUtil

***

Found valid GPT with protective MBR; using GPT

Disk /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0: 2342903808 sectors, 2234M

Logical sector size: 512

Disk identifier (GUID): 66a6f2ac-9ca9-4710-8a8b-77d490df4896

Partition table holds up to 128 entries

First usable sector is 34, last usable sector is 2342903774

Number  Start (sector)    End (sector)  Size       Code  Name

   1            2048      2342903774       2234M   0700

0 Kudos
hussainbte
Expert
Expert

Hi Paul,

I understand this RAID 6 only has/had the vmfs datastore.

This is not a local drive which is also used for ESXi Install having multiple partitions.

If a local drive is used for ESXi Install, the installation usually creates a vmfs partition on the remaining space of the LUN.

Also provide the latest vmkernel log after running the below command


vmkfstools -V  (to refresh the vmkfs mounts)


voma -m vmfs -d /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0:1 -s /tmp/analysis.txt


Check the linked KB for help


VMware KB: Using vSphere On-disk Metadata Analyzer (VOMA) to check VMFS metadata consistency


Regards,

Mudasser

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/
0 Kudos
HawkieMan
Enthusiast
Enthusiast

I have a strange feeling that the local disk array was set to RAID 0 and not to any other recoverable RAID level. Remember RAID 0 = JBOD more or less, and a single disk failure will mean you lose data.

0 Kudos
hussainbte
Expert
Expert

Was there a disk failure?.

kindly confirm the RAID level as well

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/
0 Kudos
PSimpson2
Contributor
Contributor

Hi hussainbte

It was raid level 6 & there was a physical disk failure

0 Kudos
PSimpson2
Contributor
Contributor

hi hassainbte,

here is the content of the analysis.txt file from the tmp dir

Checking if device is actively used by other hosts

ERROR: Failed to check for heartbeating hosts on device'/vmfs/devices/disks/mpx.vmhba1:C0:T1:L0:1'

and the last 10 lines from the vmkernel log

2016-03-02T10:51:20.887Z cpu2:32793)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x9e (0x439d80932f80, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2016-03-02T10:51:20.888Z cpu3:521981)<3>ata2.00: bad CDB len=16, scsi_op=0x9e, max=12

2016-03-02T10:51:20.888Z cpu2:32793)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x9e (0x439d80932f80, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2016-03-02T10:51:20.889Z cpu3:521981)FSS: 5327: No FS driver claimed device 'mpx.vmhba32:C0:T0:L0': No filesystem on the device

2016-03-02T10:51:20.928Z cpu3:521981)VC: 3551: Device rescan time 32 msec (total number of devices 7)

2016-03-02T10:51:20.928Z cpu3:521981)VC: 3554: Filesystem probe time 58 msec (devices probed 6 of 7)

2016-03-02T10:51:20.928Z cpu3:521981)VC: 3556: Refresh open volume time 0 msec

2016-03-02T10:51:33.068Z cpu5:32796)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x28 (0x439d80953540, 521984) to dev "mpx.vmhba1:C0:T1:L0" on path "vmhba1:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0. Act:NONE

2016-03-02T10:51:33.068Z cpu5:32796)ScsiDeviceIO: 2646: Cmd(0x439d80953540) 0x28, CmdSN 0x2 from world 521984 to dev "mpx.vmhba1:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

2016-03-02T10:53:54.167Z cpu6:32797)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d809f4600, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

0 Kudos