VMware Cloud Community
chrisLE
Contributor
Contributor

ESXi can't connect to NFS share with separate vmkernel NFS NIC and stack

Hi there,

I'm setting up two new hosts which should replace our old ones. We use NFS, and in the previous installation the vMotion Interface was used to do the NFS traffic.

Now we have 10G and I created a separate vSwitch with two physical NICs in it, created a "custom" TCP/IP-Stack with no gateway in it, added two port groups for iSCSI and NFS with their VLANs and added a vmkernel interface into the NFS port group with its own "custom" TCP/IP-Stack.

I can do a ping via console

[root@MP1-U07-ESXi-01:~] esxcli network diag ping -I vmk2 --netstack=NFS -H 192.168.100.61

  Trace:

  Received Bytes: 64

  Host: 192.168.100.61

  ICMP Seq: 0

  TTL: 255

  Round-trip Time: 233 us

  Dup: false

  Detail:

  

  Received Bytes: 64

  Host: 192.168.100.61

  ICMP Seq: 1

  TTL: 255

  Round-trip Time: 108 us

  Dup: false

  Detail:

  

  Received Bytes: 64

  Host: 192.168.100.61

  ICMP Seq: 2

  TTL: 255

  Round-trip Time: 231 us

  Dup: false

  Detail:

  Summary:

  Host Addr: 192.168.100.61

  Transmitted: 3

  Recieved: 3

  Duplicated: 0

  Packet Lost: 0

  Round-trip Min: 107 us

  Round-trip Avg: 190 us

  Round-trip Max: 233 us

But if I try to add a NFS storage, it fails due to a can't connect error.

What did I do wrong? The vmkernel NIC IP is in the same subnet as the NFS server.

Kind Regards,

Chris

0 Kudos
12 Replies
daphnissov
Immortal
Immortal

Remove the iSCSI vmkernel from that custom TCP/IP stack as that's not supported, especially with any sort of port binding. Also, show your TCP/IP stacks and their configurations plus your NFS vmkernel interface.

0 Kudos
chrisLE
Contributor
Contributor

The iSCSI port group is for VMs, there is no vmkernel NIC in it, therefore there is no TCP/IP-Stack on it. I want to use the same physical NICs for NFS and iSCSI, they are in different VLANs.

esxcli network ip netstack get -N NFS

NFS

   Key: NFS

   Name: NFS

   Enabled: true

   Max Connections: 11000

   Current Max Connections: 11000

   Congestion Control Algorithm: newreno

   IPv6 Enabled: true

   Current IPv6 Enabled: true

   State: 4660

esxcli network ip interface list

...

vmk2

   Name: vmk2

   MAC Address: 00:50:56:65:c9:40

   Enabled: true

   Portset: vSwitch1

   Portgroup: NFS

   Netstack Instance: NFS

   VDS Name: N/A

   VDS UUID: N/A

   VDS Port: N/A

   VDS Connection: -1

   Opaque Network ID: N/A

   Opaque Network Type: N/A

   External ID: N/A

   MTU: 1500

   TSO MSS: 65535

   Port ID: 50331654

esxcfg-vmknic -l

Interface  Port Group/DVPort/Opaque Network        IP Family IP Address                              Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type                NetStack           

...

vmk2       NFS                                     IPv4      192.168.100.41                          255.255.255.0   192.168.100.255 00:50:56:65:c9:40 1500    65535     true    STATIC              NFS                

vmk2       NFS                                     IPv6      fe80::250:56ff:fe65:c940                64                              00:50:56:65:c9:40 1500    65535     true    STATIC, PREFERRED   NFS

0 Kudos
daphnissov
Immortal
Immortal

Add a gateway to this TCP/IP stack and see if that fixes it. Even if you don't have a gateway it doesn't matter as L2 traffic won't be sent to it.

0 Kudos
chrisLE
Contributor
Contributor

Nope, added IP within the subnet, no change

0 Kudos
daphnissov
Immortal
Immortal

What is the error you receive when attempting the connection?

0 Kudos
chrisLE
Contributor
Contributor

The original German error:

NFS-Mount 192.168.100.61:/vm_nfs_sata fehlgeschlagen: Verbindung mit NFS-Server kann nicht hergestellt werden.

Rough translation:

NFS-Mount 192.168.100.61:/vm_nfs_sata failed: Connection with NFS server could not be established.

0 Kudos
daphnissov
Immortal
Immortal

Are you certain that this mount works to begin with, ESXi kernel networking aside? There are other misconfigurations that produce the same error message, many of them on the server side.

0 Kudos
chrisLE
Contributor
Contributor

I checked, the exports are allowed on the whole subnet. And we're using NFS3, so it can't be any other authentication issue. The current host are connected, so even using the "mount on additional hosts" doesn't work.

Is there a way to get a more detailed message? The connection try takes a few seconds, so I'm guessing a timeout is happening, the hosts are brand new, so it must be them.

0 Kudos
daphnissov
Immortal
Immortal

What device is providing these NFS exports? Do you have root permissions set correctly (no_root_squash directive)? There are a few options, even when using NFSv3, that impact the connection success. I'd suggest removing complexity while you troubleshoot by having a standard vmkernel interface on the default TCP/IP stack that's on the same L2 as your NAS, test connectivity, and when resolved switch that over.

0 Kudos
chrisLE
Contributor
Contributor

Like I said, the NFS server exports are already being used by the two existing hosts the two new ones are going to replace. The only difference is using ESXi 6.5 instead of 6.0 and using a custom stack.

I deleted the vmk NIC and re-added it with the default TCP/IP stack. This time, the datastore got connected. Why isn't the custom stack working?

0 Kudos
daphnissov
Immortal
Immortal

Without seeing more of your configuration, I cannot really tell.

0 Kudos
jacherian
Contributor
Contributor

As mentioned by ksimonsen on 10-23-2019 01:27 PM in the discussion "NFS storage won't connect over custom TCP/IP stack", he/she attributed https://kb.vmware.com/s/article/50112854 in solving the issue. This also worked for me, and I can now mount NFS4.1 datastores hosted by my Synology NAS to my ESXi 6.7 host using the custom TCP/IP stack tied to two gigabit vmnics devoted to SAN traffic.

0 Kudos