W have a really strange problem with ESXi 3.5 (last patch, but the problem is also with older 3.5 version).
All our stores are connected over NFS with a open-e DSS storage.
Everything worked but
if I shutdown ESX and the store together for maint. and power on both most or all stores are inactive, even if the storage is up. The esx boots faster then the storage so the storage is not up when the esx is ready
Reboot of the esx and or the storage does not help.
So there is no way to start my VMs. pinging the store from esx is ok and also vmkfping is ok.
I can also mount the nfs store from a seperat linux machine. So the configuration etc. must be ok. I changed nothing only the shutdown.
The only way I could get the store running is:
Delete all stores in esxi
reboot the storage
reboot the esxi (but have to wait until the storage is up)
configure new stores in esx.
In the esx log i could see this error message:
vmkernel: 0:00:00:17.697 cpu0:1259)NFS: 107: Command: (remount) Server: (220.127.116.11) IP: (18.104.22.168) Path: (/share/backupdsa1) Label: (iscsi1backupdsa1) Options: (None)
vmkernel: 0:00:00:48.447 cpu0:1259)WARNING: NFS: 898: RPC error 13 (RPC was aborted due to timeout) trying to get port for Mount Program (100005) Version (3) Protocol (TCP) on Server (22.214.171.124)
vmkernel: 0:00:00:48.447 cpu3:1184)WARNING: NFS: 960: Connect failed for client 0x9213a08 sock 134351240: I/O error
vmkernel: 0:00:00:48.447 cpu3:1184)WARNING: NFS: 898: RPC error 12 (RPC failed) trying to get port for Mount Program (100005) Version (3) Protocol (TCP) on Server (126.96.36.199)
I found al lot of rpc problem with ESX server but not with ESXi.
For me it looks like that ESX makes some kind of footprint of the store and if, the mounting of the store fails while booting the esx (because the storage is not up) it never mounts this store again.
Can someone validate this with nfs ?
If I have a working store and reboot the storage alone, the store comes back. Also if I boot the esxi alone while the storage is up its working,
I tried a lot of things to fix it:
1. tried to make a entry of the storage in etc/hosts
2. Removed all bonds on the storage and tried only one nic
3. Removed bonding on the esxi
4. tried it with and without VMKernel Gateway
5. Removed the DNS entries in the storage
I have to fix this problem as soon as possible. Hope someone could help