Missing datastore

PSimpson2 · ‎02-10-2016

Hi All,

First post here but I am tearing my hair out trying to sort this out.

A server I look after suffered a failure on one of the drives in a RAID 6 array.

The first we knew of it (server is at a remote site) was the VM's would not start and they both show the .vmx files inaccessible they are stored on a datastore called Data_Store

Its a local disk on Esxi Server

Replaced the failed drive in the array and then let it rebuild, but still the VM's failed to load.
They still showed the same as in the image above.

I checked the Storage adapters and the disk with the datastore is there and mounted

I trolled the net and Vmware communities and tried alot of different things like doing Re-Scans and various esxcli commands show the datastore is there but none of them have gotten back my data store.

I tried Add Storage from the configuration tab

It can see the disk...but the only option is format not keep the existing data

Here are some more screen shots showing the volume and such.

I am really sorry for the long winded post and images but I am extremely frustrated and have had this dumped in my lap.
If anyone can help it would be greatly appreciated.

Cheers,

Paul

Nick_Andreev · ‎02-11-2016

If vSphere Client offers to format the datastore, that means it cannot see a VMFS partition on the presented LUN. This seems to be a case of a corrupted LUN. I would suggest to format the LUN and restore from backup.

---
If you found my answers helpful please consider marking them as helpful or correct.
VCIX-DCV, VCIX-NV, VCAP-CMA | vExpert '16, '17, '18
Blog: http://niktips.wordpress.com | Twitter: @nick_andreev_au

MKguy · ‎02-11-2016

The device is still kind of there, but currently inaccessible, as indicated by the IO error. A reformat will probably not even work in this state.

Please check the /var/log/vmkernel.log, vmkwarning.log and syslog.log files for errors.

Also provide some more info with the following commands:

# esxcli storage filesystem list

# esxcli storage vmfs snapshot list

# esxcli storage core adapter list

# esxcli storage core device detached list

# esxcli storage core device partition list

# esxcli storage core path list -d [ID]

# esxcli storage nmp device list -d [ID]

# esxcli storage nmp satp generic deviceconfig get -d [ID]

# cat /proc/scsi/*/*

# vmkfstools --queryfs -h /vmfs/devices/disks/[ID]

# partedUtil get /vmfs/devices/disks/[ID]

# partedUtil getptbl /vmfs/devices/disks/[ID]

# partedUtil getUsableSectors /vmfs/devices/disks/[ID]

# fdisk -l /vmfs/devices/disks/[ID]

You could try to detach and re-attach the storage device (# esxcli storage core device set -d [ID] --state off/on)

Have you rebooted the server already as well? If you don't have backups and before you actively fiddle around too much, you should try to make an image backup of the entire volume with a Linux live CD or something (dd is your friend).

-- http://alpacapowered.wordpress.com

continuum · ‎02-12-2016

Hi
download this iso http://sanbarrow.com/livecds/moa64-nogui/MOA64-nogui-incl-src-111014-efi.iso

and call me if you are interested. In a few minutes I can then tell you if there is a chance or if you are screwed.
Ulli

________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

thibaudpeter · ‎02-13-2016

Maybe the partition table is gone !!
Erro during a RAID reconstruction happen

What's the result of #ls -l /vmfs/devices/disks ?

PSimpson2 · ‎02-14-2016

This doesnt look good

2016-02-14T21:46:05.795Z cpu1:34051)ScsiDeviceIO: 2646: Cmd(0x439d809e3240) 0x28, CmdSN 0xd from world 34572 to dev "mpx.vmhba1:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

2016-02-14T21:46:05.801Z cpu0:34572 opID=f9b8cbdc)World: 15446: VC opID 63E9CDFF-00000138-88fb maps to vmkernel opID f9b8cbdc

2016-02-14T21:46:05.801Z cpu0:34572 opID=f9b8cbdc)FSS: 5327: No FS driver claimed device '550a3e0f-e5592108-fc82-00215e570140': No filesystem on the device

2016-02-14T21:46:05.801Z cpu0:34572 opID=f9b8cbdc)LVM: 13824: Failed to mount 550a3e0f-e5592108-fc82-00215e570140 (now umounting): Not supported

hussainbte · ‎02-15-2016

Hi Paul,

As MkGuy said, share the partition layout of the identified naa.ID.

The fact that it recognizes there is a vmfs label, it is possible that the partition table is an issue.

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/

PSimpson2 · ‎02-16-2016

vmkfstools --queryfs -h /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

devfs-1.00 file system spanning 0 partitions.

File system label (if any):

Mode: private

Capacity 512 bytes, 512 bytes available, file block size 512 bytes, max supported file size 0 bytes

UUID: 00000000-00000000-0000-000000000000

Partitions spanned (on "notDCS"):

Is Native Snapshot Capable: NO

partedUtil getptbl /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

gpt

145839 255 63 2342903808

1 2048 2342903774 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

partedUtil get /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

145839 255 63 2342903808

1 2048 2342903774 0 0

partedUtil getUsableSectors /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

34 2342903774

fdisk -l /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0

***

*** The fdisk command is deprecated: fdisk does not handle GPT partitions. Please use partedUtil

***

Found valid GPT with protective MBR; using GPT

Disk /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0: 2342903808 sectors, 2234M

Logical sector size: 512

Disk identifier (GUID): 66a6f2ac-9ca9-4710-8a8b-77d490df4896

Partition table holds up to 128 entries

First usable sector is 34, last usable sector is 2342903774

Number Start (sector) End (sector) Size Code Name

1 2048 2342903774 2234M 0700

hussainbte · ‎02-23-2016

Hi Paul,

I understand this RAID 6 only has/had the vmfs datastore.

This is not a local drive which is also used for ESXi Install having multiple partitions.

If a local drive is used for ESXi Install, the installation usually creates a vmfs partition on the remaining space of the LUN.

Also provide the latest vmkernel log after running the below command

vmkfstools -V (to refresh the vmkfs mounts)

voma -m vmfs -d /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0:1 -s /tmp/analysis.txt

Check the linked KB for help

VMware KB: Using vSphere On-disk Metadata Analyzer (VOMA) to check VMFS metadata consistency‌

Regards,

Mudasser

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/

HawkieMan · ‎02-28-2016

I have a strange feeling that the local disk array was set to RAID 0 and not to any other recoverable RAID level. Remember RAID 0 = JBOD more or less, and a single disk failure will mean you lose data.

hussainbte · ‎03-01-2016

Was there a disk failure?.

kindly confirm the RAID level as well

If you found my answers useful please consider marking them as Correct OR Helpful Regards, Hussain https://virtualcubes.wordpress.com/

PSimpson2 · ‎03-01-2016

Hi hussainbte

It was raid level 6 & there was a physical disk failure

PSimpson2 · ‎03-01-2016

hi hassainbte,

here is the content of the analysis.txt file from the tmp dir

Checking if device is actively used by other hosts

ERROR: Failed to check for heartbeating hosts on device'/vmfs/devices/disks/mpx.vmhba1:C0:T1:L0:1'

and the last 10 lines from the vmkernel log

2016-03-02T10:51:20.887Z cpu2:32793)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x9e (0x439d80932f80, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2016-03-02T10:51:20.888Z cpu3:521981)<3>ata2.00: bad CDB len=16, scsi_op=0x9e, max=12

2016-03-02T10:51:20.888Z cpu2:32793)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x9e (0x439d80932f80, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2016-03-02T10:51:20.889Z cpu3:521981)FSS: 5327: No FS driver claimed device 'mpx.vmhba32:C0:T0:L0': No filesystem on the device

2016-03-02T10:51:20.928Z cpu3:521981)VC: 3551: Device rescan time 32 msec (total number of devices 7)

2016-03-02T10:51:20.928Z cpu3:521981)VC: 3554: Filesystem probe time 58 msec (devices probed 6 of 7)

2016-03-02T10:51:20.928Z cpu3:521981)VC: 3556: Refresh open volume time 0 msec

2016-03-02T10:51:33.068Z cpu5:32796)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x28 (0x439d80953540, 521984) to dev "mpx.vmhba1:C0:T1:L0" on path "vmhba1:C0:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0. Act:NONE

2016-03-02T10:51:33.068Z cpu5:32796)ScsiDeviceIO: 2646: Cmd(0x439d80953540) 0x28, CmdSN 0x2 from world 521984 to dev "mpx.vmhba1:C0:T1:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0.

2016-03-02T10:53:54.167Z cpu6:32797)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439d809f4600, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE