mikev80
Contributor
Contributor

ESXi 6.7U2 not seeing all the NVMe, only 2 of 12 - don't think it's a driver issue

I have 10 x Intel P4500 and 2 x Intel P4600 in my 4 hosts vsan cluster.  I only see 2 of them in the GUI with ESXi 6.7U2.

Here's where it gets interesting:

  • esxcli storage core nmp device list = sees 1 x P4500 and 1 x P4600

What's weird is that each is assigned a unique ID following the new disk naming convention that starts with eui.xxx, but 1 of them have 9 "Other IDs" following the older naming convention t10.NVMe_____Intel P4500___<unique serial number>

  • ls /dev/disks = sees 1xP4500 and 1xP4600

Same as above.  unique eui.xxx file, but symlinks that points to the same 2 eui.xxx file

  • esxcfg-scsidevs -l = sees 2 of them

same as above

  • esxcli nvme device list = sees all 12!

  • esxcfg-scsidevs -a = sees all 12!

storage adapter path in the GUI also sees 12 unique paths.

I've opened a ticket already with support but not making much progress.

Is it normal for 6.7 to lump all make/model into one unique ID?  My SATA drives are showing up unique but not the NVMe drives.

0 Kudos
9 Replies
TheBobkin
VMware Employee
VMware Employee

Hello mikev80

Welcome to (posting on :smileygrin:) Communities.

What driver and firmware are you using on these and what are the part numbers for both devices (or vSAN VCG entry URL if you have it)?

Is there any difference in firmware between the devices that can be seen and the ones that cannot?

Them not being seen in /dev/disks means they are not being picked up by ESXi as usable storage devices not just vSAN.

Any chance some of them have not been detected/marked as Local storage? (then again that wouldn't explain them not being in /dev/disks so probably not possible).

100% positive nothing silly like RAID configuration applied to the devices?

Can you share the output of:

# esxcli storage core device list

# vdq -q

Bob

0 Kudos
mikev80
Contributor
Contributor

Thanks!

When I run esxcfg-scsidev -a to find the driver, i see it listed as just "nvme" for the 12 NVMe drives.

Any suggestion on how to check firmware of the drives that cannot be seen?

The system does somehow recognize the drives because I see them assigned to a vmhba. 

I also see them as individual symlinks in /dev/disks but not as a file.  Each symlink has the naming convention with each drives' serial number in the name and maps back to one of the 2 eui.xxx files.

Nameing convention changed with 6.7U2: NVMe Devices with NGUID Device Identifiers

It feels like ESXi thinks there is 1xP4500 drive with 2 multipaths and 1xP4600 drive with 9 multipaths instead of 12 individual drives.

esxcli storage core device list shows 2 drives and so does vdq -q

0 Kudos
TheBobkin
VMware Employee
VMware Employee

Hello Mike,

"When I run esxcfg-scsidev -a to find the driver, i see it listed as just "nvme" for the 12 NVMe drives."

There are numerous methods of getting the driver version but if you already know the driver family name ('nvme' if that is what esxcfg-scsidevs shows) then use:

# vmkload_mod -s nvme | grep Version

"Any suggestion on how to check firmware of the drives that cannot be seen?"

As I was alluding to above, esxcli storage core device list should show the last 4 characters of firmware under 'Revision' - if they are not showing here then you are going to have to use out-of-band-management/BIOS to determine these - some 3rd party plug-ins can also tell these but this is below ESXi so ESXi doesn't have this capability unless something else is informing it.

Bob

0 Kudos
mikev80
Contributor
Contributor

Ok, I'm able to verify the driver version with the HCL of the 2 devices that the system can see. I didn't suspect driver was the problem, but confirmed any way. 

Lets take the Intel P4500 capacity drives as an example.  I think my system thinks all the 10 drives are the same drives with 10 different adapter path to them.  Can anyone verify the below results is expected?

Results of esxcfg-scsidevs -l.  Note the "Other Names" section.

eui.0100000001000000e4d25c000073e014

Device Type: Direct-Acess

sixe: 3815447

Display Name: Local NVMe Disk (eui.0100000001000000e4d25c000073e014)

Multipath Plugn: NMP

Console Device: /vmfs/devices/disks/eui.0100000001000000e4d25c000073e014

Devfs Path: /vmfs/devices/disks/eui.0100000001000000e4d25c000073e014

Vendor: NVMe  Model: INTEL SSDPE2KX04  Revision: QDV1

Is RDM Capable: false

Is Local: true Is SSD: true

Other Names:

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

vml.010000000.....

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

t10.NVMe______INTEL SSDPE2KX04____<serial number>__00000001

Results of esxcfg-scsdevs -u. It's a table that makes me think the system is assigning multiple paths to 1 NVMe drive.

Primary UID                                                       Other UID

eui.0100000001000000e4d25c000073e014     vml.010000000.....

eui.0100000001000000e4d25c000073e014     vml.010000000.....

eui.0100000001000000e4d25c000073e014     vml.010000000.....

eui.0100000001000000e4d25c000073e014     vml.010000000.....

.

.

.

.

0 Kudos
depping
Leadership
Leadership

Can you post the details about the drivers? Intel recommended the async intel drivers usually for the P4500 and P4600 instead of the inbox.

0 Kudos
russsa
Contributor
Contributor

I have this exact same problem. Two of the same 1tb NVMe in identical PCIe adapters. Only one shows up.

ls /dev/disks shows all my disks and only one nvme (bottom one)

[root@stripe:/dev/disks] ls /dev/disks

mpx.vmhba32:C0:T0:L0                                                  vml.0000000000766d68626133323a303a30

mpx.vmhba32:C0:T0:L0:1                                                vml.0000000000766d68626133323a303a30:1

mpx.vmhba32:C0:T0:L0:5                                                vml.0000000000766d68626133323a303a30:5

mpx.vmhba32:C0:T0:L0:6                                                vml.0000000000766d68626133323a303a30:6

mpx.vmhba32:C0:T0:L0:7                                                vml.0000000000766d68626133323a303a30:7

mpx.vmhba32:C0:T0:L0:8                                                vml.0000000000766d68626133323a303a30:8

mpx.vmhba32:C0:T0:L0:9                                                vml.0000000000766d68626133323a303a30:9

naa.5000c5006340ce57                                                  vml.0100000000323032305f333631325f324241375f3739363400504349652053

naa.5000c500635a18c3                                                  vml.02000000005000c5006340ce57535432303030

naa.5000c5008449c0cb                                                  vml.02000000005000c500635a18c3535432303030

naa.5000c5008465415f                                                  vml.02000000005000c5008449c0cb535432303030

naa.5000c5008465a38f                                                  vml.02000000005000c5008465415f535432303030

naa.5000cca028450060                                                  vml.02000000005000c5008465a38f535432303030

naa.5000cca02845b02c                                                  vml.02000000005000cca028450060485553373234

naa.5000cca06d55e520                                                  vml.02000000005000cca02845b02c485553373234

t10.NVMe____PCIe_SSD________________________________202036122BA77964  vml.02000000005000cca06d55e520485553373234

"esxcli storage core device list" only shows one of the nvme drives.

"esxcli nvme device list" shows:

[root@stripe:/dev/disks] esxcli nvme device list

HBA Name  Status  Signature

--------  ------  ---------------------

vmhba1    Online  nvmeMgmt-nvme00050000

vmhba3    Online  nvmeMgmt-nvme00670000

I'm not sure what to do. Any help is appreciated.

0 Kudos
KlausKupferschm
Contributor
Contributor

I can't believe nobody solved it. I have the same issue. Please let me know what you did to solve this.

I installed 3 years ago a NVMe-PCI-Card without any issues (HGST Ultrastar SN150 HUSPR3216AHP301 SSD 1.6 TB 0T00831)

Yesterday, I installed another NVMe-PCI-Card (HGST Ultrastar SN260 HUSMR7664BHP301 SSD 6,4TB 0TS1304) into the same ESXi 6.7u3 Host

Unfortunately the new disk shows "Offline" with the following command: esxcli nvme device list

HBA Name  Status   Signature

--------  -------  ----------------------

vmhba5    Online   nvmeMgmt-nvme00030000

vmhba3    Offline  nvmeMgmt-nvme001300000

esxcfg-scsidevs -u

Primary UID                                                     Other UID

mpx.vmhba32:C0:T0:L0                                            vml.0100000000303132333435363738393031496e7465726e

naa.60030d900e0bfb02f17f416dbe09d663                            vml.020007000060030d900e0bfb02f17f416dbe09d663566972747561

naa.60030d90297238074359aa0da1d3312b                            vml.020000000060030d90297238074359aa0da1d3312b566972747561

naa.60030d903e49bd02d469c40e4f9b24c4                            vml.020003000060030d903e49bd02d469c40e4f9b24c4566972747561

naa.60030d90490ba102514406aaa7292e26                            vml.020002000060030d90490ba102514406aaa7292e26566972747561

naa.60030d904b493c06336901de7e71ea46                            vml.02000a000060030d904b493c06336901de7e71ea46566972747561

naa.60030d904e10fb02981a17234f9738f3                            vml.020008000060030d904e10fb02981a17234f9738f3566972747561

naa.60030d907f00fb023ca56e3a61531529                            vml.020005000060030d907f00fb023ca56e3a61531529566972747561

naa.60030d9093f2a3008afa48c2d801d6ec                            vml.020009000060030d9093f2a3008afa48c2d801d6ec566972747561

naa.60030d90bf05fb0221926713872845bd                            vml.020006000060030d90bf05fb0221926713872845bd566972747561

naa.60030d90cdf3fa028df767e60326c201                            vml.020004000060030d90cdf3fa028df767e60326c201566972747561

naa.60030d90ea05a10213c94ea7b6e21e49                            vml.020001000060030d90ea05a10213c94ea7b6e21e49566972747561

naa.600a098000ae567a0000083758eec62d                            vml.0200010000600a098000ae567a0000083758eec62d4d4433387878

naa.600a098000ae567a0000083a58eec6cc                            vml.0200020000600a098000ae567a0000083a58eec6cc4d4433387878

naa.600a098000ae567a0000083d58eec7d9                            vml.0200030000600a098000ae567a0000083d58eec7d94d4433387878

naa.600a098000ae567a0000084058eec84c                            vml.0200040000600a098000ae567a0000084058eec84c4d4433387878

naa.600a098000ae567a0000085558f3e24e                            vml.0200050000600a098000ae567a0000085558f3e24e4d4433387878

naa.61866da099d22800207f944a52c596c9                            vml.020000000061866da099d22800207f944a52c596c9504552432048

naa.61866da099d228002081416a1909b4bc                            vml.020000000061866da099d228002081416a1909b4bc504552432048

naa.6e843b661af9f49d8b93d4dbfd8e76d2                            vml.02000000006e843b661af9f49d8b93d4dbfd8e76d2695343534920

t10.NVMe____HUSPR3216AHP301_________________________81E4186100CA0C00vml.0100000000383145345f313836315f303043415f3043303000485553505233

to present the new Drive into a VM I need the vml.-UID. I don't know to find it.

0 Kudos
reviewerofstora
Contributor
Contributor

Seeing this exact same problem well into ESXi 6.7u3 here. Its with bulk groups of identical SSDs (have had this happen with more than one brand/controller type). The current batch we have 4 of 10 drives show as online, the rest offline. Identical to the previous commentor that nvme device list shows the entire group, but /dev/disks shows only 4 in my case.

0 Kudos
TheBobkin
VMware Employee
VMware Employee

Hello reviewerofstorage​,

Can you check the drives using Intel® SSD Data Center Tool and check if they have 4k sector size?(e.g. SectorSize: 4096)

# isdct show –a -intelssd

If they are set to 4k, and if possible (e.g. no data on the drives) can you try changing the sector size of the drives to 512 and retesting how they are seen by ESXi/vSAN?

Gain Optimal Performance by Changes to SSD Physical Sector Size

Bob

0 Kudos