DerekShaw
Enthusiast
Enthusiast

new esxi 6.7.0 host will not connect to the same nfs server the esxi 6.0.0 host it is replacing. How to troubleshoot?

Jump to solution

I have seen many of the unanswered questions about esxi mysteriously not mounting NFS shares.  This one provides perhaps a chance to track down what is going on.

The situation:  I have built a new esxi 6.7.0 host to move some VMs to while we upgrade the hardware on a host currently running esxi 6.0.0

The plan was simple, and I've done this numerous times before (with earlier versions of esxi) --  use a manual  ghettoVCB run to capture the VMs to the long-running NFS server then use cp and vmkfstools to move them on to the new box (running esxi 6.7.0). (ghettoVmotion....)

But the esxi 6.7.0 throws the maddeningly vague error

2019-03-07T03:44:12.632Z cpu7:2098551 opID=7457c6f4)NFS: 171: NFS mount 192.168.42.3:/somepool/esxi_backupstore0 failed: Unable to connect to NFS server.

This NFS server is v3, FreeBSD 11.2 p0.  It is used every week by ghettoVCB to make copies of the VMs.  It also used to host the linux home dirs for the machines on the LAN ('til we virtualized that to a FreeBSD VM on the esxi host).

So, off I go and find all the  variants of this problem (mostly unsolved).  Salient points -- no firewall on any of the 3 hosts, I can vmkping the NFS server, and netcat to it.  I can mount an NFS store on a different FreeBSD server (running 9.1 p0) from the esxi 6.7.0 box, across the internet via a VPN tunnel.  I can mount the FreeBSD 11.2 NFS store from linux boxes, (and obviously from the esxi 6.0.0 box).  NFS is all v3.  The exports on both NFS machines are virtually identical (no pun intended).

so, here are the vmkernel log excerpts showing the failing mount and the successful mounts. If anybody knows how to make these appear as code blocks, please let me know.

esxi 6.7.0 failure on 192.168.42.3

2019-03-07T03:43:40.836Z cpu0:2098551 opID=7457c6f4)World: 11942: VC opID b1067550 maps to vmkernel opID 7457c6f4

2019-03-07T03:43:40.836Z cpu0:2098551 opID=7457c6f4)NFS: 160: Command: (mount) Server: (192.168.42.3) IP: (192.168.42.3) Path: (/somepool/esxi_backupstore0) Label: (backupstore0) Options: (None)

2019-03-07T03:43:40.836Z cpu0:2098551 opID=7457c6f4)StorageApdHandler: 977: APD Handle be94838e-79e611ed Created with lock[StorageApd-0x430e2a0d6700]

2019-03-07T03:43:40.836Z cpu0:2098551 opID=7457c6f4)CpuSched: 693: user latency of 2099586 RPC-tx-192.168.42.3.0.111 0 changed by 2098551 hostd-worker -6

2019-03-07T03:43:40.837Z cpu0:2098551 opID=7457c6f4)SunRPC: 1099: Destroying world 0x200982

2019-03-07T03:43:40.837Z cpu0:2098551 opID=7457c6f4)CpuSched: 693: user latency of 2099587 RPC-tx-192.168.42.3.3.245 0 changed by 2098551 hostd-worker -6

2019-03-07T03:43:51.633Z cpu0:2098551 opID=7457c6f4)SunRPC: 3303: Synchronous RPC abort for client 0x4304551ec120 IP 192.168.42.3.3.245 proc 1 xid 0x2907a083 attempt 1 of 3

2019-03-07T03:44:01.633Z cpu2:2098551 opID=7457c6f4)SunRPC: 3303: Synchronous RPC abort for client 0x4304551ec120 IP 192.168.42.3.3.245 proc 1 xid 0x2907a084 attempt 2 of 3

2019-03-07T03:44:12.632Z cpu0:2098551 opID=7457c6f4)SunRPC: 3303: Synchronous RPC abort for client 0x4304551ec120 IP 192.168.42.3.3.245 proc 1 xid 0x2907a085 attempt 3 of 3

2019-03-07T03:44:12.632Z cpu0:2098551 opID=7457c6f4)SunRPC: 1099: Destroying world 0x200983

2019-03-07T03:44:12.632Z cpu7:2098551 opID=7457c6f4)StorageApdHandler: 1063: Freeing APD handle 0x430e2a0d6700 [be94838e-79e611ed]

2019-03-07T03:44:12.632Z cpu7:2098551 opID=7457c6f4)StorageApdHandler: 1147: APD Handle freed!

2019-03-07T03:44:12.632Z cpu7:2098551 opID=7457c6f4)NFS: 171: NFS mount 192.168.42.3:/somepool/esxi_backupstore0 failed: Unable to connect to NFS server.

esxi 6.7.0 success on 192.168.254.3

2019-03-07T04:11:56.012Z cpu2:2098619 opID=2a84b29f)World: 11942: VC opID b1067619 maps to vmkernel opID 2a84b29f

2019-03-07T04:11:56.012Z cpu2:2098619 opID=2a84b29f)NFS: 160: Command: (mount) Server: (192.168.254.3) IP: (192.168.254.3) Path: (/otherpool/esxi_backupstore0) Label: (backupstore0) Options: (None)

2019-03-07T04:11:56.012Z cpu2:2098619 opID=2a84b29f)StorageApdHandler: 977: APD Handle c02083a6-ef60d04d Created with lock[StorageApd-0x430e2a0d6700]

2019-03-07T04:11:56.012Z cpu2:2098619 opID=2a84b29f)CpuSched: 693: user latency of 2099848 RPC-tx-192.168.254.3.0.111 0 changed by 2098619 hostd-worker -6

2019-03-07T04:11:56.059Z cpu2:2098619 opID=2a84b29f)SunRPC: 1099: Destroying world 0x200a88

2019-03-07T04:11:56.059Z cpu2:2098619 opID=2a84b29f)CpuSched: 693: user latency of 2099849 RPC-tx-192.168.254.3.2.216 0 changed by 2098619 hostd-worker -6

2019-03-07T04:11:56.101Z cpu2:2098619 opID=2a84b29f)SunRPC: 1099: Destroying world 0x200a89

2019-03-07T04:11:56.101Z cpu2:2098619 opID=2a84b29f)CpuSched: 693: user latency of 2099850 RPC-tx-192.168.254.3.0.111 0 changed by 2098619 hostd-worker -6

2019-03-07T04:11:56.144Z cpu2:2098619 opID=2a84b29f)SunRPC: 1099: Destroying world 0x200a8a

2019-03-07T04:11:56.144Z cpu2:2098619 opID=2a84b29f)CpuSched: 693: user latency of 2099851 RPC-tx-192.168.254.3.8.1 0 changed by 2098619 hostd-worker -6

2019-03-07T04:11:56.212Z cpu2:2098619 opID=2a84b29f)NFS: 346: Restored connection to the server 192.168.254.3 mount point /otherpool/esxi_backupstore0, mounted as c02083a6-ef60d04d-0000-000000000000 ("backupstore0")

2019-03-07T04:11:56.212Z cpu2:2098619 opID=2a84b29f)NFS: 221: NFS mount 192.168.254.3:/otherpool/esxi_backupstore0 status: Success

seen and tried the suggestions in these
VMware Knowledge Base Troubleshooting connectivity issues to an NFS datastore on ESX and ESXi hosts (1003967)

NFS mount failed: Unable to connect to NFS server.

Strange Issue : NFS datastore can't be mounted only to several ESXi hosts

Unable to add NFS datastore in ESXI 6.5

1 Solution

Accepted Solutions
DerekShaw
Enthusiast
Enthusiast

well, I don't know if it qualifies as an answer, but after upgrading the freebsd NFS server from 11.2p0 to 11.2p9 the esxi 6.7.0 host can now mount the NFS shares that I could previously mount only with esxi 6.0.0

View solution in original post

5 Replies
ThompsG
Virtuoso
Virtuoso

Hi DerekShaw,

Just a question if I may. In the log extracts you have there is two IP addresses (192.168.42.3 - failure and 192.168.254.3) and I'm wondering if these are two different NFS hosts being tried from the same ESXi server?

Kind regards.

0 Kudos
DerekShaw
Enthusiast
Enthusiast
ThompsG

yes -- these are two different NFS servers being mounted from the same host (at slightly different times). It was a troubleshooting step to narrow the troubleshooting scope.  You can see a certain similarity in the way I set things up at different clients (in this case the .254.3 address is at my office), which makes it maybe not so obvious what I have done.  The .3 is mostly random chance, but fixed IPs typically get assigned in the first 50 addresses, and NFS servers typically get set up pretty early in the process.

After getting some much-needed sleep (and a weekend, when time demands are not so intense), I am going to try patching up the local NFS server (.42.3) to current freeBSD 11.2 levels ( I think it is p9), and if that doesn't work, then to 12.0pX.  I can do those without having to down the esxi box, and they are relatively reliable operations.  I'll post the results when I've done it.

The ghettoVCB would normally run on sunday night, so I'm still withing normal "failure paramaters".

Thanks!

0 Kudos
DerekShaw
Enthusiast
Enthusiast

well, I don't know if it qualifies as an answer, but after upgrading the freebsd NFS server from 11.2p0 to 11.2p9 the esxi 6.7.0 host can now mount the NFS shares that I could previously mount only with esxi 6.0.0

View solution in original post

ThompsG
Virtuoso
Virtuoso

Great work! Glad you got this solved Smiley Happy

0 Kudos
Clustev
Contributor
Contributor

Have the same problem.

ESXi 6.5 hosts are connected to Sigmanas NFS share without problems.

ESXi 6.7 hosts Unable to connect to NFS server.

2020-07-30T10:05:00.477Z cpu0:2101391 opID=40d32ba3)NFS: 160: Command: (mount) Server: (major.test.ru) IP: (172.22.222.115) Path: (/mn

2020-07-30T10:05:00.477Z cpu0:2101391 opID=40d32ba3)StorageApdHandler: 977: APD Handle a14cee7c-4ebf2a62 Created with lock[StorageApd-0x4310fa9096

2020-07-30T10:05:00.478Z cpu0:2101391 opID=40d32ba3)SunRPC: 1099: Destroying world 0x20759e

2020-07-30T10:05:11.435Z cpu4:2101391 opID=40d32ba3)SunRPC: 3303: Synchronous RPC abort for client 0x430556c67180 IP 172.22.222.115.3.185 proc 1 x

2020-07-30T10:05:22.433Z cpu4:2101391 opID=40d32ba3)SunRPC: 3303: Synchronous RPC abort for client 0x430556c67180 IP 172.22.222.115.3.185 proc 1 x

2020-07-30T10:05:32.433Z cpu8:2101391 opID=40d32ba3)SunRPC: 3303: Synchronous RPC abort for client 0x430556c67180 IP 172.22.222.115.3.185 proc 1 x

2020-07-30T10:05:32.433Z cpu8:2101391 opID=40d32ba3)SunRPC: 1099: Destroying world 0x20759f

2020-07-30T10:05:32.433Z cpu8:2101391 opID=40d32ba3)StorageApdHandler: 1063: Freeing APD handle 0x4310fa9096f0 [a14cee7c-4ebf2a62]

2020-07-30T10:05:32.433Z cpu8:2101391 opID=40d32ba3)StorageApdHandler: 1147: APD Handle freed!

2020-07-30T10:05:32.433Z cpu8:2101391 opID=40d32ba3)NFS: 171: NFS mount major.test.ru:/mnt/ZFSPOOL/vmware failed: Unable to connect to

NAS firmware Xigmanas FreeBSD 12.1-RELEASE-p7 #0 r363366M: Mon Jul 20 16:59:25 CEST 2020

Node software version

ESXi Version6.7.0
HypervisorVMware ESXi
Build16075168

Ping and vmkping are works. Trst machine with CentOS 7 connected without problems to my NAS. Hosts 6.7 connected without problems to QNAP NFS storage.

Any ideas?

0 Kudos