Yes I know there are numerous threads on the topic of moving VMs to NFS, but none of them have helped resolve this issue. So here is our setup and what we are trying to do. We have two identical ESX Hosts in a cluster for vmotion/drs, a brand new PowerEdge 2950 III beefed up rather nicely running Linux with NFS shares exported. All three servers are running over gigabit ethernet and the hosts are setup to see the datastores on the NFS shares. Now we are trying to get the existing VMs from the local storage over to the NFS storage, but no matter what we do they always fail. Now I can create a new VM directly in the NFS storage just fine. I was even able to create a new VM on local storage with 8gb drive and move it over to the NFS through sVmotion.
We have tried cold migration, both hot and cold cloning of vm to nfs, sVmotion, tried vmkfstools -i source -d thin dest as well as with 2gbsparse, and no matter what they all fail at seemingly random percentages. I have had errors including "the virtual disk is corrupt or incompatible format" and "failed due to an I/O error" Does anyone have any other ideas on how to get this done or why it keeps failing?
Can you post your ESX networking configuration for the NFS vmkernel ports and your /etc/exports or equivalent on the Linux box.
Thanks.
A screenshot of the NFS properties is attached. NFS adapters on both Hosts are identical (with different IPs though). The /etc/exports contains:
/share/ESX 10.2.0.0/255.255.255.0(rw,sync,no_root_squash,no_subtree_check) 10.2.10.0/255.255.255.0(rw,sync,no_root_squash,no_subtree_check)
I left this method out of my original post, but if we export an existing VM to ovf and then try to import it as a new machine directly into the NFS storage it also fails with "Unexpected Response: SESSION_COMPLETE"
No ideas on this one?
Can you try changing the export to only export to the vmkernel (NFS) subnet (10.2.10.0 / 255.255.255.0 and take out the 10.2.0.0/255.255.255.0 and see if that works. So your file would look like this:
/share/ESX 10.2.10.0/255.255.255.0(rw,sync,no_root_squash,no_subtree_check)
I am also not familiar with the (no_subtree_check) parameter and would remove it temporarily for troubleshooting purposes.
Did that and tried with svmotion again. Fails with "A general system error occurred: failed to copy disk(s) (vim.fault.InvalidDiskFormat)"
Any other thoughts or suggestions on this? Thanks.
No other ideas although currently svmotion between fc and nfs is not supported. Can you try a cold migration with the vm powered down.
From FibreChannel to NFS? We don't have any FC at all. It's all Gig Ethernet. And cold migrations fail with the same disk unsupported or incorrect format errors. Which, I'm not sure it matters or if i mentioned it in the OP but the NFS shares are on an ext3 partition within linux. I thought ESX was supposed to treat it as VMFS anyway though.
Your right. Moving from local disk (SCSI) to NFS using svMotion is what i was referring to. The ext3 piece should not matter as long as root has full access to the filesystem.
When creating the NFS storage on the ESX servers - did you do it exactly the same way, each time? Something as simple as using a short host name vs. a FQDN can cause the storage to be percieved differently.
Check the actual disk string on each server - in /vmfs/volumes - the long hex numbers that are created for each volume. If the numbers don't match, you could have an issue.
I had a similar problem early on, when I first started using NFS data stores.
-
-Andrew Stueve
Yes they were setup the exact same on both hosts and I verified that the string of numbers is the same. I had to ssh in to do that, should it show the string of hex in VCS for NFS shares? It only shows the one for the local storage.