VMware Cloud Community
nvmeof-issue
Contributor
Contributor

esxcli nvme fabrics discover failed

Hello, We are testing NVMe-oF with EXSI 7. I am attempting to discover a SPDK NVMe-oF target using "esxcli nvme fabrics discover" command, but failed:

[root@localhost:~] esxcli nvme fabrics discover -a vmhba32 -i 10.251.32.216 -p 44200
Unable to find transport address

The "dmesg" shows:

2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)World: 12458: VC opID esxcli-3d-4a53 maps to vmkernel opID 791a7c62
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMEDEV:711 Controller 265 allocated, maximum queues 16
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:110 Controller 265, connecting using parameters:
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:112 kato: 0
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:113 subtype: 1
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:114 vmkParams.asqsize: 16
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:115 vmkParams.cntlid: 0xffff
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:117 vmkParams.hostid: 5fd04702-15a6-343b-3582-6c92bf556abb
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:118 vmkParams.hostnqn: nqn.2014-08.org.nvmexpress:uuid:5fd04702-15a6-343b-3582-6c92bf556abb
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:119 vmkParams.subnqn: nqn.2014-08.org.nvmexpress.discovery
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:121 vmkParams.trType: RDMA
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:123 vmkParams.trsvcid: 44200
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:125 vmkParams.traddr: 10.251.32.216
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)NVMFDEV:127 vmkParams.tsas:
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)nvmerdma:1020 vmhba32, controller 265
2020-12-17T03:07:21.767Z cpu2:526767 opID=791a7c62)nvmerdma:814 [ctlr 265, queue 0] cqsize 16
2020-12-17T03:07:21.768Z cpu2:526767 opID=791a7c62)nvmerdma:1526 [ctlr 265, queue 0]
2020-12-17T03:07:21.768Z cpu9:527915)nvmerdma:2150 [ctlr 265, queue 0]
2020-12-17T03:07:21.768Z cpu2:526767 opID=791a7c62)nvmerdma:300 [ctlr 265, queue 0]
2020-12-17T03:07:21.768Z cpu2:526767 opID=791a7c62)nvmerdma:1126 [ctlr 265, queue 0]
2020-12-17T03:07:21.768Z cpu5:527916)nvmerdma:693 [ctlr 265, queue 0]
2020-12-17T03:07:21.788Z cpu3:524823)nvmerdma:734 [ctlr 265, queue 0] event 0
2020-12-17T03:07:21.788Z cpu3:524823)nvmerdma:1722 [ctlr 265, queue 0]
2020-12-17T03:07:21.794Z cpu1:524871)nvmerdma:734 [ctlr 265, queue 0] event 2
2020-12-17T03:07:21.794Z cpu1:524871)nvmerdma:1797 [ctlr 265, queue 0]
2020-12-17T03:07:21.806Z cpu4:524833)nvmerdma:734 [ctlr 265, queue 0] event 9
2020-12-17T03:07:21.806Z cpu4:524833)nvmerdma:1839 [ctlr 265, queue 0]
2020-12-17T03:07:21.806Z cpu4:524833)nvmerdma:1843 [ctlr 265, queue 0] connected
2020-12-17T03:07:21.806Z cpu2:526767 opID=791a7c62)nvmerdma:259 [ctlr 265, queue 0] connected
2020-12-17T03:07:21.806Z cpu2:526767 opID=791a7c62)nvmerdma:1113 [ctlr 265] connected successfully

2020-12-17T03:07:21.806Z cpu2:526767 opID=791a7c62)NVMFDEV:1108 controller 265, queue 0
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMFDEV:1181 Connected to queue 0, successfully
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMFDEV:1068 Controller 0x4313afc18880, set ctlrID to 1
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMFDEV:187 Adding new controller nqn.2014-08.org.nvmexpress.discovery to active list
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2408 disabling controller...
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2417 enabling controller...
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:590 Controller 265, queue 0, set queue size to 16
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2425 reading version register...
2020-12-17T03:07:21.807Z cpu2:526767 opID=791a7c62)NVMEDEV:2439 get controller identify data...
2020-12-17T03:07:46.929Z cpu4:524833)nvmerdma:734 [ctlr 265, queue 0] event 10
2020-12-17T03:07:46.929Z cpu4:524833)nvmerdma:1893 [ctlr 265, queue 0]
2020-12-17T03:07:46.929Z cpu4:524833)nvmerdma:1897 [ctlr 265, queue 0] disconnected due to CM event 10
2020-12-17T03:07:46.929Z cpu5:527916)nvmerdma:702 Queue 0 disconnect world wakes up: Success
2020-12-17T03:07:46.929Z cpu5:527916)nvmerdma:542 [ctlr 265, queue 0]
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038d0 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038e0 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038f0 op 0x80 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03900 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03910 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03920 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03930 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03940 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03850 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03860 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03870 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03880 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c03890 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038a0 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038b0 op 0x0 status 0x5
2020-12-17T03:07:46.932Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0x430a11c038c0 op 0x0 status 0x5
2020-12-17T03:07:46.933Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0xfffffffffffffff2 op 0x0 status 0x5
2020-12-17T03:07:46.933Z cpu5:527916)nvmerdma:521 [ctlr 265, queue 0] cleanup vmkCmd 0x453a44bffb80[0], status 0x80d
2020-12-17T03:07:46.933Z cpu9:527915)nvmerdma:2106 [ctlr 265, queue 0] Completion failed, wrid 0xfffffffffffffff1 op 0x0 status 0x5
2020-12-17T03:07:46.933Z cpu2:526767 opID=791a7c62)WARNING: NVMEDEV:2446 Failed to get controller identify data, status: Failure
2020-12-17T03:07:46.933Z cpu2:526767 opID=791a7c62)nvmerdma:2582 [ctlr 265, queue 0] cmd 0x453a44bffb80, queue not connected: Failure
2020-12-17T03:07:46.933Z cpu2:526767 opID=791a7c6

 

 

0 Kudos
4 Replies
nvmeof-issue
Contributor
Contributor

And on a linux host, the discovery succeed:

./nvme discover -t rdma -a 10.251.32.216 -s 44200

Discovery Log Number of Records 1, Generation counter 3
=====Discovery Log Entry 0======
trtype: rdma
adrfam: ipv4
subtype: nvme subsystem
treq: not required
portid: 0
trsvcid: 44200
subnqn: nqn.2016-06.io.spdk:cnode
traddr: 10.251.32.216
rdma_prtype: not specified
rdma_qptype: connected
rdma_cms: rdma-cm
rdma_pkey: 0x0000

0 Kudos
govindrathod42
Contributor
Contributor

I am experiencing the same issue.

2021-02-11T09:14:52.635Z cpu8:1051530 opID=860873e4)nvmerdma:1399 vmhba66, controller 298
2021-02-11T09:14:52.656Z cpu23:1049453)nvmerdma:902 [ctlr 298, queue 0] event 0
2021-02-11T09:14:52.660Z cpu20:1049573)nvmerdma:902 [ctlr 298, queue 0] event 2
2021-02-11T09:14:52.664Z cpu20:1049493)nvmerdma:902 [ctlr 298, queue 0] event 9
2021-02-11T09:14:52.664Z cpu20:1049493)nvmerdma:2457 [ctlr 298, queue 0] connected
2021-02-11T09:14:52.664Z cpu8:1051530 opID=860873e4)nvmerdma:1492 [ctlr 298] connected successfully
2021-02-11T09:14:52.664Z cpu8:1051530 opID=860873e4)NVMFDEV:1837 controller 298, queue 0
2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMFDEV:1910 Connected to queue 0, successfully
2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMFDEV:1701 Controller 0x431414a15540, set ctlrID to 32769
2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMFDEV:630 Adding new controller to target active list: nqn.2014-08.org.nvmexpress.discovery
2021-02-11T09:14:52.665Z cpu8:1051530 opID=860873e4)NVMEDEV:2895 disabling controller...
2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMEDEV:2391 Controller cannot be disabled, status: Timeout
2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMEDEV:2899 Failed to disable controller, status: Timeout
2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMEDEV:4609 Failed to initialize controller, status: Timeout.
2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)WARNING: NVMFDEV:846 Failed to register controller 298, status: Timeout
2021-02-11T09:15:00.169Z cpu6:1051530 opID=860873e4)nvmerdma:1533 [ctlr 298]
2021-02-11T09:15:00.171Z cpu20:1049493)nvmerdma:902 [ctlr 298, queue 0] event 10
2021-02-11T09:15:00.171Z cpu0:1286360)nvmerdma:2943 [ctlr 298, queue 0] Beacon completion succeeded, wrid 0xfffffffffffffff2 op 0x80 status 0x5
2021-02-11T09:15:00.196Z cpu1:1286361)nvmerdma:867 [ctlr 298, queue 0] disconnect world dying, exit.
2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)nvmerdma:1568 controller 298 disconnected
2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)NVMEDEV:1095 Ctlr 298 freeing
2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)NVMEDEV:6117 Cancel requests of controller 298, 0 left.
2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)WARNING: NVMFDEV:1300 Failed to connect to controller, status: Timeout
2021-02-11T09:15:00.201Z cpu6:1051530 opID=860873e4)WARNING: NVMFVSI:1074 Failed to discover controllers, status: Timeout
[root@localhost:~]

0 Kudos
rbharali87
Contributor
Contributor

Hi were you able to solve this issue  ?

I'm facing the same issue with ESX 7.0U3

 

[root@localhost:~] esxcli nvme fabrics discover -a vmhba68 -i 192.168.18.4 -p 8009
Unable to find transport address
[root@localhost:~]

023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)World: 12077: VC opID esxcli-17-75c4 maps to vmkernel opID a38002cf
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMEDEV:1393 Ctlr 312 allocated, maximum queues 16
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:159 Controller 312, connecting using parameters:
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:161 kato: 0
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:162 subtype: 1
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:163 cdc: 0
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:166 target type: NVMe
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:174 vmkParams.asqsize: 32
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:175 vmkParams.cntlid: 0xffff
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:177 vmkParams.hostid: 63d7700b-ee42-c554-5e2a-98be942a2a1a
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:178 vmkParams.hostnqn: nqn.2014-08.org.nvmexpress:uuid:63d7700b-ee42-c554-5e2a-98be942a2a1a
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:179 vmkParams.subnqn: nqn.2014-08.org.nvmexpress.discovery
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:196 vmkParams.trType: TCP
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:198 vmkParams.trsvcid: 8009
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:200 vmkParams.traddr: 192.168.18.4
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFDEV:202 vmkParams.tsas.digest: 0
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectController:781 vmhba68, controller 312
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectCM:4408 [ctlr 312, queue 0]
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)NVMFNET:151 Uplink: vmnic4, portset: vswitch_nvme_TCP1.
2023-04-18T14:34:10.721Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectCM:4460 [ctlr 312, queue 0] Using source vmknic vmk1 for socket binding
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_SocketConnect:4339 [ctlr 312, queue 0] Failed to connect socket: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectQueueInt:4129 [ctlr 312, queue 0] failed to connect: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_FreeSubmissionResources:5189 [ctlr 312, queue 0]
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)nvmetcp:nt_ConnectController:860 Failed to connect admin queue: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFDEV:882 Failed to transport connect controller 312, status: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)NVMEDEV:1565 Ctlr 312 freeing
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)NVMEDEV:9057 Cancel requests of controller 312, 0 left.
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFDEV:1432 Failed to connect to controller, status: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFEVT:1773 Failed to discover controllers, status: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFEVT:1456 Discover and connect controller failed: Failure
2023-04-18T14:34:10.771Z cpu20:1051704 opID=a38002cf)WARNING: NVMFVSI:1300 Failed to discover controllers: Failure

However a discovery from a linux box is successful

└──╼ nvme discover --transport=tcp --traddr=192.168.18.4 --host-traddr=192.168.18.5 --trsvcid=8009

Discovery Log Number of Records 1, Generation counter 18
=====Discovery Log Entry 0======
trtype: tcp
adrfam: ipv4
subtype: nvme subsystem
treq: not specified, sq flow control disable supported
portid: 1
trsvcid: 8009
subnqn: nqn.2014-08.org.nvmexpress:uuid:03000200-0400-0500-0006-000700080009
traddr: 192.168.18.4
sectype: none

0 Kudos
rbharali87
Contributor
Contributor

Were you able to solve this issue ?

0 Kudos