I am running ESX4.1 I have 3 hosts in a DRS cluster. I noticed that one of the hosts could no longer connect to my NFS shares. I removed the mounts and tried to mount them again. The other two hosts can still connect to the NFS shares. Here are my logs from vmkerne0 for the host that cannot connect:
Thanks.
Aug 20 01:26:38 BLADE vmkernel: 8:17:49:50.733 cpu7:4201)VMK_PCI: 746: device 000:009:00.0 capType 16 capIndex 76
Aug 20 01:26:38 BLADE vmkernel: 8:17:49:50.735 cpu5:4208)VMK_PCI: 746: device 000:009:00.1 capType 16 capIndex 76
Aug 20 01:26:38 BLADE vmkernel: 8:17:49:50.943 cpu6:4211)VMK_PCI: 746: device 000:009:00.0 capType 16 capIndex 76
Aug 20 01:26:38 BLADE vmkernel: 8:17:49:50.980 cpu4:4206)VMK_PCI: 746: device 000:009:00.1 capType 16 capIndex 76
Aug 20 04:26:26 BLADE vmkernel: 8:20:49:38.456 cpu2:4203)VMK_PCI: 746: device 000:009:00.0 capType 16 capIndex 76
Aug 20 04:26:26 BLADE vmkernel: 8:20:49:38.458 cpu6:4205)VMK_PCI: 746: device 000:009:00.1 capType 16 capIndex 76
Aug 20 04:26:26 BLADE vmkernel: 8:20:49:38.655 cpu4:4202)VMK_PCI: 746: device 000:009:00.0 capType 16 capIndex 76
Aug 20 04:26:26 BLADE vmkernel: 8:20:49:38.700 cpu7:4204)VMK_PCI: 746: device 000:009:00.1 capType 16 capIndex 76
Aug 20 06:32:15 BLADE vmkernel: 8:22:55:26.668 cpu4:4132)NetPort: 1157: disabled port 0x1000009
Aug 20 06:32:15 BLADE vmkernel: 8:22:55:26.668 cpu4:4132)Net: 1847: disconnected client from port 0x1000009
Aug 20 06:34:16 BLADE vmkernel: 8:22:57:28.134 cpu4:4112)Config: 297: "VMOverheadGrowthLimit" = 0, Old Value: -1, (Status: 0x0)
Aug 20 07:26:14 BLADE vmkernel: 8:23:49:26.053 cpu6:4201)VMK_PCI: 746: device 000:009:00.0 capType 16 capIndex 76
Aug 20 07:26:14 BLADE vmkernel: 8:23:49:26.055 cpu4:4205)VMK_PCI: 746: device 000:009:00.1 capType 16 capIndex 76
Aug 20 07:26:14 BLADE vmkernel: 8:23:49:26.263 cpu7:4206)VMK_PCI: 746: device 000:009:00.0 capType 16 capIndex 76
Aug 20 07:26:14 BLADE vmkernel: 8:23:49:26.326 cpu6:4201)VMK_PCI: 746: device 000:009:00.1 capType 16 capIndex 76
Aug 20 08:33:47 BLADE vmkernel: 9:00:56:58.588 cpu3:4111)WARNING: NFS: 841: Failed to get port, RPC error 13 (RPC was aborted due to timeout) for program (100005) version (3) protocol (tcp) on Server ()
Aug 20 08:33:47 BLADE vmkernel: 9:00:56:58.588 cpu3:4111)WARNING: NFS: 849: Unable to get port for program 100005 version 3 protocol tcp on
Aug 20 08:33:47 BLADE vmkernel: 9:00:56:58.588 cpu3:4111)WARNING: NFS: 1097: RPC unable to create socket to send umount for /VMWareStoreE/Blade on host ()
Aug 20 08:36:07 BLADE vmkernel: 9:00:59:18.605 cpu6:4110)WARNING: NFS: 841: Failed to get port, RPC error 13 (RPC was aborted due to timeout) for program (100005) version (3) protocol (tcp) on Server ()
Aug 20 08:36:07 BLADE vmkernel: 9:00:59:18.605 cpu6:4110)WARNING: NFS: 849: Unable to get port for program 100005 version 3 protocol tcp on
Aug 20 08:36:07 BLADE vmkernel: 9:00:59:18.605 cpu6:4110)WARNING: NFS: 1097: RPC unable to create socket to send umount for /VMWareStoreF/Blade/ on host ()
Aug 20 08:39:05 BLADE vmkernel: 9:01:02:16.905 cpu4:4111)NFS: 149: Command: (mount) Server: () IP: () Path: (/VMWareStoreF/Blade/) Label: () Options: (None)
Aug 20 08:39:37 BLADE vmkernel: 9:01:02:48.625 cpu4:4111)WARNING: NFS: 913: RPC error 13 (RPC was aborted due to timeout) trying to get port for Mount Program (100005) Version (3) Protocol (TCP) on Server ()
Aug 20 08:39:37 BLADE vmkernel: 9:01:02:48.625 cpu4:4111)NFS: 160: NFS mount (machine name):/VMWareStoreF/Blade/ failed: Unable to connect to NFS server
This ended up being a problem with the vmkernel config on one of the hosts. It did not match the others. Once I made the modification in the VMkernal port group to match my other hosts, the problem was fixed.