I have a fairly simple setup with a 4 EXS servers connected to same NFS datastore on a NetApp storage. I have a debian template that I am deploying virtual machines from accross all hosts. The problem is all hosts get the VMs deployed normally without any problems except for one...lets call it host A. On host A when I try to deploy the VM from template, it takes a long time and it finally errors out with follwoing message
Note: This all hosts are 4.0 and are being managed from vSphere server
Couple of observations:
Thie file in question Linux90.vmdk actually belongs to the template
I try to monitor the progress of VM deployment by observing my /vmfs/volume/.../ directory, I can see that files are being created for new VM however something goes wrong towards the end of the procedure that causes deployment to fail
What files can I look at in order to find the possible cause of this failure? Has anyone seen this before? Please note that there are other hosts where template is being deployed without any problems so I suspect that problem is local to hostA
Thanks for the reply
but I am begining to think that the issue is purely ESX issue with nothing to do with Template deployments, since I am having problem creating VMs from scratch on this particular host
I have the same error while trying to create a new VM on an NFS datastore on NetApp.
The NFS datastore is created OK and you can browse it.
I tried but failed to create a new VM on this NFS datastore. The error is
"Error caused by file. /vmfs/volumes/xxxxxxxx/New Virtual Machine
How is your ESX performance in general? Have you tried deploying a VM to local storage (assuming one is available)? I think my issue is closer to yours, because I too tried to deploy a VM directly on my NFS datastore directly without using a template and got the same error...more so the performance on this ESX box is horrible even though the resources (4GB RAM, 2G CPU) aren't tied up at all. I suspect a bad NiC and trying to troubleshoot the problem now...
I'm not familiar with read ESX log. These 2 message below are the only things that seem relevant to what I was doing.
Jul 8 09:03:53 fox vmkernel: 10:23:19:51.718 cpu1:4105)NFS: 107: Command: (mount) Server: (fox) IP: (10.80.89.4) Path: (/vol/datavol) Label: (nas01nfs) Options: (None)
Jul 8 09:04:23 fox vmkernel: 10:23:20:21.966 cpu0:4105)WARNING: NFS: 898: RPC error 13 (RPC was aborted due to timeout) trying to get port for Mount Program (100005) Version (3) Protocol (TCP) on Server (10.80.89.4)
Turned out it's the NFS/volume permission issue.
Even though the NFS permissions looked right from the NetApp GUI i.e R/W access to all hosts, root access to ESX 4.0 and another Solaris 10 (Sparc) box for my quick NFS admin stuffs.
Somehow the the NFS volume was created with 555 permissions. After I did a 'chmod 777' (from my Solaris box), new VM can be created on it from the vSphere 4.0 client. I don't think I should have to do this but...
In my case, I got this error message when I tried to copy a VM from an iSCSI target on sun storage 7110 down
to the local disk on my ESX4 server... I looked at a lot of complex factors, but it turns out I didnt have enough
local disk space. Duh. Posting this in the hopes it saves
someone time wasted looking for a more complicated explanation.... simplest is best.