VMware Cloud Community
gcomstar
Contributor
Contributor

NVMEof Datastore Issues

Hello, We are testing NVMEof with esxi 7. I am having issues getting the device to be recognized. I am using mellanox connectx-4 cards. I am attempting to access a nvme device as a test. I am able to discover the controller in the vmware interface.

The namespace tab also shows the correct disk size and name. 750gb in this case

on the paths tab the following shows up:

Runtime Name: vmhba67:C0:T1:L0

Target: Blank

Lun: 0

Status: Dead

Below is the test config from the linux server. Anyone have any suggestions for next steps for troubleshooting? /dev/nvme0n1 is a freshly erased nvme drive.

modprobe nvmet

modprobe nvmet-rdma

sudo /bin/mount -t configfs none /sys/kernel/config/

sudo mkdir /sys/kernel/config/nvmet/subsystems/PSC

cd /sys/kernel/config/nvmet/subsystems/PSC

echo 1 | sudo tee -a attr_allow_any_host > /dev/null

sudo mkdir namespaces/1

cd namespaces/1/

echo -n /dev/nvme0n1> device_path

echo 1 | sudo tee -a enable > /dev/null

sudo mkdir /sys/kernel/config/nvmet/ports/1

cd /sys/kernel/config/nvmet/ports/1

echo 10.10.11.1 | sudo tee -a addr_traddr > /dev/null

echo rdma | sudo tee -a addr_trtype > /dev/null

echo 4420 | sudo tee -a addr_trsvcid > /dev/null

echo ipv4 | sudo tee -a addr_adrfam > /dev/null

sudo ln -s /sys/kernel/config/nvmet/subsystems/PSC/ /sys/kernel/config/nvmet/ports/1/subsystems/PSC

sudo mkdir /sys/kernel/config/nvmet/ports/2

cd /sys/kernel/config/nvmet/ports/2

echo 10.10.12.1 | sudo tee -a addr_traddr > /dev/null

echo rdma | sudo tee -a addr_trtype > /dev/null

echo 4420 | sudo tee -a addr_trsvcid > /dev/null

echo ipv4 | sudo tee -a addr_adrfam > /dev/null

sudo ln -s /sys/kernel/config/nvmet/subsystems/PSC/ /sys/kernel/config/nvmet/ports/2/subsystems/PSC

0 Kudos
10 Replies
gcomstar
Contributor
Contributor

just a follow up I was able to add it to another linux system without issue. Is there somewhere on the esxi host i can check logs? Is there possibly something wrong with my subnqn? Most vendor appliances have long winded names. The linux server accepted PSC but perhaps vmware cant.

0 Kudos
gcomstar
Contributor
Contributor

Found this in the logs, it seems HPP doesnt support the device for some reason (its a mellanox connectx-4 adapter back to a linux target). Perhaps they dont support the linux target, or perhaps i simply need to do a better job of naming my linux target nqn.

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)WARNING: HPP: HppClaimPath:3719: Failed to claim path 'vmhba67:C0:T2:L0': Not supported

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)HPP: HppUnclaimPath:3765: Unclaiming path vmhba67:C0:T2:L0

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)ScsiPath: 8397: Plugin 'HPP' rejected path 'vmhba67:C0:T2:L0'

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)ScsiClaimrule: 1568: Plugin HPP specified by claimrule 65534 was not able to claim path vmhba67:C0:T2:L0: Not supported

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)WARNING: ScsiPath: 8327: NMP cannot claim a path to NVMeOF device vmhba67:C0:T2:L0

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)ScsiClaimrule: 1568: Plugin NMP specified by claimrule 65535 was not able to claim path vmhba67:C0:T2:L0: Not supported

2020-04-24T04:01:52.789Z cpu9:2099749 opID=6441911)ScsiClaimrule: 1872: Error claiming path vmhba67:C0:T2:L0. Not supported.

2020-04-24T04:01:52.809Z cpu9:2099749 opID=6441911)WARNING: HPP: HppClaimPath:3719: Failed to claim path 'vmhba67:C0:T2:L0': Not supported

2020-04-24T04:01:52.809Z cpu9:2099749 opID=6441911)HPP: HppUnclaimPath:3765: Unclaiming path vmhba67:C0:T2:L0

0 Kudos
eric_zl_zhang
Contributor
Contributor

hello, we also have this issue. Do you resolve this issue?

our storage target map to esxi with fc-nvme, we can find the nvme controller and namespace, but can't find storage device. 

1. find nvme controller and namespace.

_______________

[root@localhost:~] esxcli nvme fabrics discover -a vmhba68 -W 0x56c92bf803002760 -w 0x56c92bf8033b2760
Transport Type Address Family Subsystem Type Controller ID Admin Queue Max Size Transport Address Transport Service ID Subsystem NQN Connected
-------------- -------------- -------------- ------------- -------------------- ------------------------------------------- -------------------- ----------------------------------- ---------
FC Fibre Channel NVM 65535 32 nn-0x56c92bf803002760:pn-0x56c92bf8033b2760 none nqn.2004-12.com.inspur:mcs.28827034 true
[root@localhost:~] [root@localhost:~] esxcli nvme fabrics discover -a vmhba68 -W 0x56c92bf803002760 -w 0x56c92bf8033b2760 [root@localhost:~] esxcli nvme controller list
Name Controller Number Adapter Transport Type Is Online
----------------------------------------------------------------------------- ----------------- ------- -------------- ---------
nqn.2004-12.com.inspur:mcs.28827034#vmhba68#56c92bf803002760:56c92bf8033b2760 467 vmhba68 FC true
[root@localhost:~] [root@localhost:~] esxcli nvme controller list list list list list list list list list list listn lista listm liste lists listp lista listc liste list list
Name Controller Number Namespace ID Block Size Capacity in MB
------------------------------------ ----------------- ------------ ---------- --------------
eui.d000000000000001005076000a209c06 467 2 512 10240

_______________

2. "esxcli storage core path list" command show the path is dead," esxcli storage core device list" can.t find storage device

_______________

fc.200000109bc18a3f:100000109bc18a3f-fc.56c92bf803002760:56c92bf8033b2760-
UID: fc.200000109bc18a3f:100000109bc18a3f-fc.56c92bf803002760:56c92bf8033b2760-
Runtime Name: vmhba68:C0:T3:L1
Device: No associated device
Device Display Name: No associated device
Adapter: vmhba68
Channel: 0
Target: 3
LUN: 1
Plugin: (unclaimed)
State: dead
Transport: fc
Adapter Identifier: fc.200000109bc18a3f:100000109bc18a3f
Target Identifier: fc.56c92bf803002760:56c92bf8033b2760
Adapter Transport Details: Unavailable or path is unclaimed
Target Transport Details: Unavailable or path is unclaimed
Maximum IO Size: 2097152

_______________

3. some err log

Warring: HPP: HppClaimPath:3719: Failed to claim path ‘vmhba68:C0:T0:L1’: Not supported

_______________

see attachment

_______________

0 Kudos
kp3k
Contributor
Contributor

I have the same issue, with Emulex LPe32000 PCi Fibre channel adapter and ESX 7.0.1

I think issue is due to bad ClaimRule.

I've tried add a new one but it's not working.

I've feeling ESX is not prepared for NVME storages in default

 

0 Kudos
eric_zl_zhang
Contributor
Contributor

we also use LPe32000 PCi FC adapter and ESXi 7.0.0. we have this issue。

We use IBM storage  to test, it can work fine. 

0 Kudos
kp3k
Contributor
Contributor

Issue was solved by Storage vendor. Released a new firmware, downgrade 4k to 512 volume block size supported by VMware.

Instead of vSphere 7U2  support 4k device, external storage like NetApp EF600 with 4k volume is not visible in vSphere, only 512 block size volume.

Controllers, Namespaces, Paths all are OK, but 4k device/volume is not shown in vSphere. Not supported?

Peter

0 Kudos
selsar
Contributor
Contributor

I configured a nvme block device using 'nvmetcli' with block size: 512 bytes on a Centos VM and then did an nvme connect[NVMe/TCP] to that target. ESXi is able to see the volume and it's also listed in 'esxcli nvme namespace list'. But path to the target is shown DEAD. Any pointers as to why the path is DEAD?

esxcli nvme namespace list

Name                                   Controller Number  Namespace ID  Block Size  Capacity in MB

-------------------------------------  -----------------  ------------  ----------  --------------

eui.343337304d1007610025384500000001                 256             1         512          915715

eui.343337304d1015200025384500000001                 257             1         512          915715

uuid.b8bbea9b8b34471b97b13222a954e43e                328             1         512           20480 <<<<

 

esxcli nvme controller list

Name                                                                                    Controller Number  Adapter  Transport Type  Is Online

--------------------------------------------------------------------------------------  -----------------  -------  --------------  ---------

nqn.2014-08.org.nvmexpress_144d_SAMSUNG_MZQLB960HAJR-00007______________S437NE0M100761                256  vmhba2   PCIe                 true

nqn.2014-08.org.nvmexpress_144d_SAMSUNG_MZQLB960HAJR-00007______________S437NE0M101520                257  vmhba3   PCIe                 true

testnqn#vmhba65#15.33.8.5:4420                                                                        328  vmhba65  TCP                  true

 

esxcli storage core path list -p vmhba65:C0:T0:L0

tcp.vmnic5:3c:fd:fe:c3:93:5d-tcp.unknown-

   UID: tcp.vmnic5:3c:fd:fe:c3:93:5d-tcp.unknown-

   Runtime Name: vmhba65:C0:T0:L0

   Device: No associated device

   Device Display Name: No associated device

   Adapter: vmhba65

   Channel: 0

   Target: 0

   LUN: 0

   Plugin: (unclaimed)

   State: dead <<<<<<<<<<<<<<<<<<<<<<

   Transport: tcp

   Adapter Identifier: tcp.vmnic5:3c:fd:fe:c3:93:5d

   Target Identifier: tcp.unknown

   Adapter Transport Details: Unavailable or path is unclaimed

   Target Transport Details: Unavailable or path is unclaimed

   Maximum IO Size: 1048576

 VMkernel log:

===========

2022-01-21T02:44:37.120Z cpu22:1048893)HPP: HppCreateDevice:3071: Created logical device 'uuid.b8bbea9b8b34471b97b13222a954e43e'.                             

2022-01-21T02:44:37.120Z cpu22:1048893)WARNING: HPP: HppClaimPath:3956: Failed to claim path 'vmhba65:C0:T0:L0': Not supported                                

2022-01-21T02:44:37.120Z cpu22:1048893)HPP: HppUnclaimPath:4002: Unclaiming path vmhba65:C0:T0:L0                                                             

2022-01-21T02:44:37.120Z cpu22:1048893)ScsiPath: 8597: Plugin 'HPP' rejected path 'vmhba65:C0:T0:L0'                                                          

2022-01-21T02:44:37.120Z cpu22:1048893)ScsiClaimrule: 2039: Plugin HPP specified by claimrule 65534 was not able to claim path vmhba65:C0:T0:L0: Not supported

2022-01-21T02:44:37.121Z cpu22:1048893)WARNING: ScsiPath: 8496: NMP cannot claim a path to NVMeOF device vmhba65:C0:T0:L0                                     

2022-01-21T02:44:37.121Z cpu22:1048893)ScsiClaimrule: 2039: Plugin NMP specified by claimrule 65535 was not able to claim path vmhba65:C0:T0:L0: Not supported

2022-01-21T02:44:37.121Z cpu22:1048893)ScsiClaimrule: 2518: Error claiming path vmhba65:C0:T0:L0. Not supported.

0 Kudos
selsar
Contributor
Contributor

I configured a nvme block device on Centos using 'nvmetcli' and did an nvme connect from ESXi 7. Though nvme connect was successful and the namespace was listed in 'esxcli nvme namespace list', the path is the namespace was reported DEAD. Any pointers as to why the path was reported DEAD?

vmkernel log:

==========

022-01-21T02:44:37.120Z cpu22:1048893)HPP: HppCreateDevice:3071: Created logical device 'uuid.b8bbea9b8b34471b97b13222a954e43e'.                             

2022-01-21T02:44:37.120Z cpu22:1048893)WARNING: HPP: HppClaimPath:3956: Failed to claim path 'vmhba65:C0:T0:L0': Not supported                               

2022-01-21T02:44:37.120Z cpu22:1048893)HPP: HppUnclaimPath:4002: Unclaiming path vmhba65:C0:T0:L0                                                  

2022-01-21T02:44:37.120Z cpu22:1048893)ScsiPath: 8597: Plugin 'HPP' rejected path 'vmhba65:C0:T0:L0'                                                      

2022-01-21T02:44:37.120Z cpu22:1048893)ScsiClaimrule: 2039: Plugin HPP specified by claimrule 65534 was not able to claim path vmhba65:C0:T0:L0: Not supported

2022-01-21T02:44:37.121Z cpu22:1048893)WARNING: ScsiPath: 8496: NMP cannot claim a path to NVMeOF device vmhba65:C0:T0:L0       

2022-01-21T02:44:37.121Z cpu22:1048893)ScsiClaimrule: 2039: Plugin NMP specified by claimrule 65535 was not able to claim path vmhba65:C0:T0:L0: Not supported

2022-01-21T02:44:37.121Z cpu22:1048893)ScsiClaimrule: 2518: Error claiming path vmhba65:C0:T0:L0. Not supported.

 

esxcli storage core path list -p vmhba65:C0:T0:L0

tcp.vmnic5:3c:fd:fe:c3:93:5d-tcp.unknown-

   UID: tcp.vmnic5:3c:fd:fe:c3:93:5d-tcp.unknown-

   Runtime Name: vmhba65:C0:T0:L0

   Device: No associated device

   Device Display Name: No associated device

   Adapter: vmhba65

   Channel: 0

   Target: 0

   LUN: 0

   Plugin: (unclaimed)

   State: dead

   Transport: tcp

   Adapter Identifier: tcp.vmnic5:3c:fd:fe:c3:93:5d

   Target Identifier: tcp.unknown

   Adapter Transport Details: Unavailable or path is unclaimed

   Target Transport Details: Unavailable or path is unclaimed

   Maximum IO Size: 1048576

 

esxcli nvme controller list

Name                                                                                    Controller Number  Adapter  Transport Type  Is Online

--------------------------------------------------------------------------------------  -----------------  -------  --------------  ---------

nqn.2014-08.org.nvmexpress_144d_SAMSUNG_MZQLB960HAJR-00007______________S437NE0M100761                256  vmhba2   PCIe                 true

nqn.2014-08.org.nvmexpress_144d_SAMSUNG_MZQLB960HAJR-00007______________S437NE0M101520                257  vmhba3   PCIe                 true

testnqn#vmhba65#15.33.8.5:4420                                                                        328  vmhba65  TCP                  true   

 

esxcli nvme namespace list

Name                                   Controller Number  Namespace ID  Block Size  Capacity in MB

-------------------------------------  -----------------  ------------  ----------  --------------

eui.343337304d1007610025384500000001                 256             1         512          915715

eui.343337304d1015200025384500000001                 257             1         512          915715

uuid.b8bbea9b8b34471b97b13222a954e43e                328             1         512           20480                    

0 Kudos
jjtw
Contributor
Contributor

I hit the issue with linux nvmet + esxi host.

Does someone know how to dig out how HPP rejects it as failed to claim path?

I would like to know which doc describes how to debug with HPP.

 

If you know that, please leave some information about that for me.

Thanks~

0 Kudos
croni
Contributor
Contributor

See below link. VMware expects some functionality which is not in Linux Kernel 5 and so it cant work.

We are also trying to get that working and i found this link:

https://koutoupis.com/2022/04/22/vmware-lightbits-labs-and-nvme-over-tcp/

0 Kudos