VMware Cloud Community
Mavack
Contributor
Contributor

ESXI 6.0.0 Missing Datastore

I have a home lab server that i use for a mix of different things.

Tonight i as looking over it to see if it would upgrade to 6.7.0 and noticed a datastore had vanished.

logs from vmkernel.log

2020-07-12T11:49:39.829Z cpu2:33215)<3>ata5.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x0

2020-07-12T11:49:39.829Z cpu2:33215)<3>ata5.00: irq_stat 0x40000008

2020-07-12T11:49:39.829Z cpu2:33215)<3>ata5.00: cmd 60/00:00:80:d8:00/04:00:00:00:00/40 tag 0 ncq 524288 in

         res 41/40:00:d0:da:00/0e:00:00:00:00/40 Emask 0x409 (media error) <F>

2020-07-12T11:49:39.829Z cpu2:33215)<3>ata5.00: status: { DRDY ERR }

2020-07-12T11:49:39.829Z cpu2:33215)<3>ata5.00: error: { UNC }

2020-07-12T11:49:39.830Z cpu2:33215)<6>ata5.00: configured for UDMA/133

2020-07-12T11:49:39.830Z cpu2:33215)<6>ata5: EH complete

2020-07-12T11:49:39.830Z cpu7:35982)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ" state in doubt; requested fast path state update...

2020-07-12T11:49:39.830Z cpu7:35982)ScsiDeviceIO: 2652: Cmd(0x43b580615f40) 0x88, CmdSN 0x8000007b from world 35978 to dev "t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0$

2020-07-12T11:49:39.830Z cpu7:32798)NMP: nmp_ThrottleLogForDevice:3248: last error status from device t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ repeated 1 times

2020-07-12T11:49:39.830Z cpu7:32798)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x28 (0x43b58060e7c0, 34607) to dev "t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ" on path "vmhba36:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense$

2020-07-12T11:49:39.830Z cpu7:32798)ScsiDeviceIO: 2652: Cmd(0x43b58060e7c0) 0x28, CmdSN 0x2db7 from world 34607 to dev "t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x4.

2020-07-12T11:49:39.830Z cpu7:32798)NMP: nmp_ThrottleLogForDevice:3298: Cmd 0x2a (0x43b580671d80, 32782) to dev "t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ" on path "vmhba36:C0:T0:L0" Failed: H:0x3 D:0x0 P:0x0 Possible se$

2020-07-12T11:49:39.830Z cpu7:32798)ScsiDeviceIO: 2652: Cmd(0x43b580671d80) 0x2a, CmdSN 0x2db8 from world 32782 to dev "t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ" failed H:0x3 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

However the volume is actually mounted and i can read the files off the disk, I even have a running VM that has a disk off this store and it works fine.

[root@esxi:/vmfs/volumes] ls -al

lrwxr-xr-x    1 root     root            35 Jul 12 12:08 3TB -> 5972c69e-68858108-8ea1-001e67b692d3

drwxr-xr-t    1 root     root          3080 Jul 12 10:03 5972c69e-68858108-8ea1-001e67b692d3

[root@esxi:/vmfs/volumes/5972c69e-68858108-8ea1-001e67b692d3/win10] ls -al

total 824773640

drwxr-xr-x    1 root     root           560 Jul 12 10:03 .

drwxr-xr-t    1 root     root          3080 Jul 12 10:03 ..

-rw-------    1 root     root     1099511627776 Jul 12 11:55 win10_1-flat.vmdk

-rw-------    1 root     root           525 Jul 12 10:03 win10_1.vmdk

[root@esxi:/vmfs/volumes/5972c69e-68858108-8ea1-001e67b692d3/win10] date

Sun Jul 12 12:06:36 UTC 2020

i wrote some files back and forth to see it was working, note timestamp.

found an article talking about missing datastores and how partition table might be stuffed, but looks fine to me?

https://kb.vmware.com/s/article/2046610

[root@esxi:/dev/disks] partedUtil getptbl t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ

gpt

364801 255 63 5860533168

1 2048 5860532223 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

/vmfs/devices/disks/t10.ATA_____WDC_WD30EFRX2D68EUZN0_________________________WD2DWCC4N7PP74SZ

gpt

364801 255 63 5860533168

1 2048 5860532223 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

Checking offset found at 2048:

0200000 d00d c001                             

0200004

1400000 f15e 2fab                             

1400004

0140001d  33 54 42 00 00 00 00 00  00 00 00 00 00 00 00 00  |3TB.............|

0140002d  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

It's nothing critical, just a disk i use for iso storage, some on host backups, and a scratch drive that i don't want constantly reading and writing to the SSD datastore which works fine.

They are just SATA drives on the mainboard SATA controller, and the other disks work fine just this one isn't.

0 Kudos
5 Replies
Mavack
Contributor
Contributor

No guts no glory, and it is just lab, i cli upgraded it from 6.0.0 to 6.7.3 U3 and still same problem.

I think i might manually copy everything off, format the drive and reinitialize it and re-attach it to my VMs

0 Kudos
continuum
Immortal
Immortal

This does not look like early warnings of a corrupted VMFS to me.

Partition looks fine as well.

Anyway - IMHO this SATA-disk has lost a good part of its credit.
I would replace it soon.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
Jangari
Enthusiast
Enthusiast

I suppose that this is due to the driver compatibility, or physical disk issue.

SCSI host code 0x3 means timeout, but not by NO_CONNECT or BUS_BUSY. I think that the ESXi could not access to the SATA storage device properly.

Interpreting SCSI sense codes in VMware ESXi and ESX (289902)

https://kb.vmware.com/s/article/289902

------------------------------------------------------------------

SG_ERR_DID_TIME_OUT

[0x03]    TIMED OUT for other reason (often this an unexpected device selection timeout)

------------------------------------------------------------------

Did you use the "vmw_ahci" driver for that SATA device? To isolate the cause, it may be effective to change "vmw_ahci" native driver to "sata-ahci" vmklinux driver.

Enabling and Disabling Native Drivers in ESXi 6.5 (2147565)

https://kb.vmware.com/s/article/2147565

As a point to caution, if another SATA device is used for ESXi boot disk, changing the SATA driver may cause disruptive problems and require to reinstall the ESXi. So you should back up the host configuration at first.

How to back up ESXi host configuration (2042141)

https://kb.vmware.com/s/article/2042141

0 Kudos
Mavack
Contributor
Contributor

Looks like i'm using the vmw_ahci driver

[root@esxi:~] esxcli system module list | grep ahci

vmw_ahci                            true        true

[root@esxi:~] esxcli system module get -m vmw_ahci

   Module: vmw_ahci

   Module File: /usr/lib/vmware/vmkmod/vmw_ahci

   License: BSD

   Version: 1.2.8-1vmw.670.3.73.14320388

   Build Type: release

   Provided Namespaces:

   Required Namespaces: com.vmware.vmkapi@v2_5_0_0

   Containing VIB: vmw-ahci

   VIB Acceptance Level: certified

So just disabling it will push it back to the native driver?  Or do i need to load the native driver.

I just used the drivers out of the box for this install, pretty sure i started out at esxi 5or5.5 upgraded to 6, upgraded to 6.7.3 yestesterday.

The SATA controller is just the mainboard controller which is a intel S1200RPL, i have 6 SATA drives, 1xSSD which is datastore, 1x3TB which is a scratch drive stores ISOs some backups and 1 hdd for a VM that does a lot of writes that don't need to be fast, and 4 drives that are direct Mapped using RDM. The SSD and the 4 RDM drives all work fine with the default driver and are running. Even this 3TB disk works fine, the host can read and write to the drive no problems. 

The boot drive is an 8GB USB stick

Last night i copied everything off the drive with SCP onto another server, and tried to delete the partition with partedUtil, that didn't work still has hooks even with the VMs shutdown. Today i'm going to load a Linux usb wipe the partition, format the drive, do a check on it, and then reinitialize it.

0 Kudos
Mavack
Contributor
Contributor

I tried disabling the vmw_ahci driver to go back to native, made no difference reverted.

booted without the disk fine which is good, so everything is running except for the stuff that was on that disk. Covers off most of my needs.

loaded a hirens boot cd scanned the disk, reports bad sectors with the WD tool, random scan works for awhile but stalls for 20-30 seconds on bad sectors... i think it went bad in the wrong place. given most of it works. Ordered a new drive anyway.

With the disk removed the storage/devices in the browser crashs the browser so i can't even try and see the drive at the moment. When i get the new drive i'll see if it works again, otherwise i might backup the config and reinstall ESXI looks like it wants something from that drive to work properly.  That drive was the very original datastore drive before i replaced it with an SSD so might be some config it wants on there even thou it doesn't need it.

0 Kudos