Hi
I'm experimenting with an ESXi 6.5 installation on a Intel NUC6I5SYH for a lab enviroment. I'm aware of the "no official support", but please hear me out 🙂
As stated, I'm installing ESXi on a Intel NUC which has a Intel 600p NVMe SSD installed. For the most part everything works fine, but in the last month I have experienced twice that the partitions on the SSD disappears from ESXi. A simple reboot of the device will bring everything back to normal, but during the time with no access to the content of the SSD, the VM's are (of cause) not responding.
I can, however, log on to the web-interface of ESXi 6.5 and from there I see that the SSD is still recognized (I can see the make and model of the SSD), but the capacity is "0 bytes". If I log on to the ESXi host via SSH and do a "df -h" I see two partitions: one which is around 4 GB and one which is 0 bytes. This makes me think, that the SSD is not totally dead,
Even though VMware is not supporting this setup, I wonder what my next troubleshooting step should be. Does the ESXi-installation have a CLI command to read out SMART-data or "rescan" the SSD for partitions? Something to guide me in a direction if I should RMA the SSD, the NUC or just give up on ESXi in this setup.
I don't really have any logs about the incident since ESXi doesn't have anywhere to write the logs to when this problem occurs.
Thanks!
Hi,
I have exactly the same problem with my ASRock beebox with Intel 600p NVMe SSD. everytime I found its SSD "0-byte" and I was forced to reboot esxi.
I have tried firmware upgrade for Intel 600p SSD but no help.
Anyone has same problem and got solutions?
Hi,
You can look at all your storage device by use SSH on the ESXi host.
#esxcli storage core device list
Thank you,
Olivier
Hi,
I doubt if it is "nvme" driver bug.
anyway my "esxcli storage core device list" below:
t10.NVMe____INTEL_SSDPEKKW512G7_____________________BTPY631307NR512F____00000001
Display Name: Local NVMe Disk (t10.NVMe____INTEL_SSDPEKKW512G7_____________________BTPY631307NR512F____00000001)
Has Settable Display Name: true
Size: 488386
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path: /vmfs/devices/disks/t10.NVMe____INTEL_SSDPEKKW512G7_____________________BTPY631307NR512F____00000001
Vendor: NVMe
Model: INTEL SSDPEKKW51
Revision: PSF
SCSI Level: 6
Is Pseudo: false
Status: on
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is VVOL PE: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: unknown
Other UIDs: vml.0100000000425450593633313330374e523531324620202020494e54454c20
Is Shared Clusterwide: false
Is Local SAS Device: false
Is SAS: false
Is USB: false
Is Boot USB Device: false
Is Boot Device: true
Device Max Queue Depth: 256
No of outstanding IOs with competing worlds: 32
Drive Type: unknown
RAID Level: unknown
Number of Physical Drives: unknown
Protection Enabled: false
PI Activated: false
PI Type: 0
PI Protection Mask: NO PROTECTION
Supported Guard Types: NO GUARD SUPPORT
DIX Enabled: false
DIX Guard Type: NO GUARD SUPPORT
Emulated DIX/DIF Enabled: false
i have the same problem too.
when it comes to a large data copy from nvme to hdd .it just drop the partition and reboot will fix it.
i wonder if the nvme is overheating cause partition drop?
Hi,
I am having the exact same issue as well with the Intel 600P and ESXi 6.5 U1 running on a SuperMicro SYS-5028D-TN4T. It seems to be working fine until I try and provision a VM and then I get an error message that connection to the Datastore has been lost. I have updated to the latest Intel 600P firmware, I get the output for esxcli storage core device list as follows:
[root@pESXi-01:~] esxcli storage core device list
t10.NVMe____INTEL_SSDPEKKW010T7_____________________BTPY65320GA71P0H____00000001
Display Name: Local NVMe Disk (t10.NVMe____INTEL_SSDPEKKW010T7_____________________BTPY65320GA71P0H____00000001)
Has Settable Display Name: true
Size: 976762
Device Type: Direct-Access
Multipath Plugin: NMP
Devfs Path:
Vendor: NVMe
Model: INTEL SSDPEKKW01
Revision: PSF
SCSI Level: 6
Is Pseudo: false
Status: not connected
Is RDM Capable: false
Is Local: true
Is Removable: false
Is SSD: true
Is VVOL PE: false
Is Offline: false
Is Perennially Reserved: false
Queue Full Sample Size: 0
Queue Full Threshold: 0
Thin Provisioning Status: yes
Attached Filters:
VAAI Status: unsupported
Other UIDs: vml.01000000004254505936353332304741373150304820202020494e54454c20
Is Shared Clusterwide: false
Is Local SAS Device: false
Is SAS: false
Is USB: false
Is Boot USB Device: false
Is Boot Device: false
Device Max Queue Depth: 256
No of outstanding IOs with competing worlds: 32
Drive Type: unknown
RAID Level: unknown
Number of Physical Drives: unknown
Protection Enabled: false
PI Activated: false
PI Type: 0
PI Protection Mask: NO PROTECTION
Supported Guard Types: NO GUARD SUPPORT
DIX Enabled: false
DIX Guard Type: NO GUARD SUPPORT
Emulated DIX/DIF Enabled: false
I would be extremely grateful is someone has found a fix and can share.
my problem is fixed after attached a small heatsink on the controller
i suggest that you better check your temperature by running following command:
esxcli storage core device list | grep ' Display Name:' | cut -d'(' -f2 | cut -d')' -f1 | while read DISK
do
echo "********** $DISK **********"
esxcli storage core device smart get -d $DISK
done
Thanks ivanyeung510,
It is definitely a heat related issue. I had the fan setting set to Optimal speed and I have had to put it on full speed to keep the drive working which unfortunately is significantly noiser. It looks like I will need to add a heatsink myself to allow for the quieter fan.
I ran the command that you provided and the heat when I started to have issues was only 45 degrees which surprised me, I thought it would have had a higher threshold before I started to see the issues.
i have a experience on 70 degree
after installing a small heatsink ,the maximum temperature <50
Hi,
I have exactly the same problem with my ASRock beebox with Intel 600p NVMe SSD. everytime I found its SSD "0-byte" and I was forced to reboot esxi.
I have tried firmware upgrade for Intel 600p SSD but no help.
Have you find any solution about this issue?