Storage vMotion not working with NAS datastore

santosh42 · ‎03-07-2011

HI,

I have a powered on virtual machine which is deployed on NFS datastore.

Now, i want to move that VM to the local disk of the ESX server.

I tried to migrate the datastore from the NFS datastore to the local disk.

When i trigger the migrate from NAS datastore to local disk, it starts but when it reaches 18%, it hangs there and after a long time, i get a generic error message." General error occured. source detected that detsination failed to resume"

I tried serveral times but with no hope

Also during the storage vmotion the NFS datastore is disabled and grayed out. please see the attachment

DSTAVERT · ‎03-07-2011

You need to go through logs on ESXi and the NFS sotorage device to see what is causing the outage. What is the NFS device?

-- David -- VMware Communities Moderator

vernon_wells · ‎03-07-2011

Have you checked your exports for the storage? If the destination host cant write to the NFS share it can cause a hickup like that.

dreamworkit · ‎03-07-2011

It's entirely possible that you are flooding your pipeline with too much data for a movement like that. If the NAS isn't fast enough, you could be looking at the Host basically going, "I can't read because the disk is too busy." hence the offline look. Does it look like that after it fails, or does it look like the embedded DELL storage located above it?

santosh42 · ‎03-07-2011

The NFS device is another virtual machine.

Since it is a test environment, i have created a virtual machine and installed FreeNAS image from freeBSD and using that VM as a NAS device.

santosh42 · ‎03-07-2011

I agree that it might happen if the pipeline is flooded with too much data.

But the Vm that is part of storage is not more than 5 GB size.

krishnaprasad · ‎03-07-2011

Refer VMware KB http://kb.vmware.com/kb/1010045

santosh42 · ‎03-07-2011

Thanks Krishnaprasad.

I refered that KB before posting here.

I also increased the timeout for fsr.maxSwitchoverSeconds from 100 to 200 but that dint help.

krishnaprasad · ‎03-07-2011

Could you check if NFS share is mounted to both ESX servers with the same VMFS UUID? i.e. check /vmfs/volumes/<Generated ID for NFS share>

Is this same across both the ESX hosts where you perform storage vmotion?

If this is not same, Remove the storage and try adding it again to both the hosts.

If this is same, can you check the vmkernel logs to see if any specific error message is logged during the failure?

santosh42 · ‎03-07-2011

krishnaprasad wrote:
Could you check if NFS share is mounted to both ESX servers with the same VMFS UUID? i.e. check /vmfs/volumes/<Generated ID for NFS share>
Is this same across both the ESX hosts where you perform storage vmotion?
If this is not same, Remove the storage and try adding it again to both the hosts.
If this is same, can you check the vmkernel logs to see if any specific error message is logged during the failure?

I am trying to migrate the VM between the datastores within the same ESX. (not that i am trying to migrate the VM between hosts)

In ESX, one datastore is the local Dell disk with VMFS type and other datastore is NAS with NFS type.

Currently the VM is on NAS device and that i am trying to move to the Host's local disk.

I could capture entries from the vmkernel log :

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)WARNING: NFS: 1830: Failed to get attributes (No connection)

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)FSS: 735: Failed to get object b00f 36 ee6e510d 4527391d 4d74ef41 685658ef c 2 2caa37d 0 0 0 0 0 :No connection

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)WARNING: NFS: 1830: Failed to get attributes (No connection)

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)FSS: 735: Failed to get object b00f 36 ee6e510d 4527391d 4d74ef41 685658ef c 2 2caa37d 0 0 0 0 0 :No connection

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.793 cpu7:4867)WARNING: Migrate: 4249: 1299568065122625 S: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.798 cpu4:4873)Sched: vm 4874: 1246: name='vmm0:fly'

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.798 cpu4:4873)CpuSched: vm 4874: 9952: zombified unscheduled world: runState=NEW

Mar 8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.798 cpu4:4873)World: vm 4874: 4018: deathPending set; world not running, scheduling reap

Mar 8 12:53:25 blr-nlayers-138 vmkernel: 0:20:00:41.491 cpu6:4381)NFSLock: 540: Start accessing fd 0x410002f5c9a8 again

Mar 8 12:53:25 blr-nlayers-138 vmkernel: 0:20:00:41.491 cpu6:4381)NFSLock: 540: Start accessing fd 0x410002f46088 again

Mar 8 12:53:26 blr-nlayers-138 vmkernel: 0:20:00:42.480 cpu6:4340)NFS: 287: Restored connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")

Mar 8 12:53:58 blr-nlayers-138 vmkernel: 0:20:01:14.864 cpu5:4106)WARNING: NFS: 278: Lost connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")

Mar 8 12:54:08 blr-nlayers-138 vmkernel: 0:20:01:24.869 cpu5:4106)NFSLock: 579: Stop accessing fd 0x410002f46088 4

Mar 8 12:54:08 blr-nlayers-138 vmkernel: 0:20:01:24.869 cpu5:4106)NFSLock: 579: Stop accessing fd 0x410002f5c9a8 4

Mar 8 12:54:43 blr-nlayers-138 vmkernel: 0:20:01:59.827 cpu7:4111)World: vm 4879: 1534: Starting world vmkping with flags 4

Mar 8 12:54:51 blr-nlayers-138 vmkernel: 0:20:02:07.613 cpu7:4110)World: vm 4880: 1534: Starting world vmkping with flags 4

Mar 8 12:55:26 blr-nlayers-138 vmkernel: 0:20:02:42.872 cpu5:4170)BC: 3837: Failed to flush 2 buffers of size 8192 each for object 'vmware.log' b00f 36 ee6e510d 4527391d 4d74ef41 685658ef c 56404 c626c0fe 0 100000000 8181f9d000000001 417f bde07a8000000000: No connection

Thanks.

krishnaprasad · ‎03-08-2011

Are you able to browse the NAS datastore from VI client or from the CLI? do you see any timeouts to the NFS share while trying to access it manually?

bulletprooffool · ‎03-08-2011

If you are unable to vMotion it and you can afford the downtime, then follow this process.

shut down your VM
right click the VM and select 'remove from inventory'
Using SSH / WinSCP / FastSCP or similiar, connect to yuor host and browse to /vmfs/volume/<datastore>
Move (or copy if you prefer) the full folder containing all the VM's files to the new location
Right click the .vmx file oin the new location and select 'import'
Follow the wizard
If you are happy with the result, delet the old files form the old datastore (datastore browser will do the job)

I know it is a workaround, rather than a fix, but as the NFS is a local VM, you may be having some sort of weird networking type issues or disk load problems similar

Good luck

One day I will virtualise myself . . .

krishnaprasad · ‎03-08-2011

Also if you see the logs that you have posted,

Mar 8 12:53:26 blr-nlayers-138 vmkernel: 0:20:00:42.480 cpu6:4340)NFS: 287: Restored connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")

Mar 8 12:53:58 blr-nlayers-138 vmkernel: 0:20:01:14.864 cpu5:4106)WARNING: NFS: 278: Lost connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")

it tried accessing the NFS share and says 'lost connection'.. Did you see any issues with your NFS share. Worth trying to access this share from some other linux/windows systems to see if the issue is specific to the NAS share that you have?

santosh42 · ‎03-08-2011

Yes, you are right.

there seems to be networking issues while the ESX tries to connect to the NAS datastore.

There is a connectivity loss intermittently. still figuring it out why.

However, the problem is getting strange.

When i provisioned the same NFS datastore to another ESX host, there i can migrate the powered on VMs between the NFS datastore and that ESX's local disk with no issues.

This is happening only on this one ESX server.

I doubt it is mainly because i have the NFS device as a virtual machine running on this same ESX host. so i think there might be some network loop happening while moving the powered on VMs between the datastores on this particular ESX since the NFS storage is provisioned by the same VM running on the same ESX's local storage. looks like it's a paradox.

krishnaprasad · ‎03-08-2011

hmm. That could be possible cause of failure, since you might be trying to move the disk of the VM where the NFS server is running.

You can assure that by just creating another NFS share on another ESX's VM or any linux host and see storage VMotion works against that NFS share and local disk?

Or else, can you create another disk for the same VM and create the NFS share on that disk? During storage VMotion of the VM, make sure that you select only the first Disk ( OS disk of the VM ) where NFS share is NOT configured? This operation might be successful.

santosh42 · ‎03-09-2011

That seems to be a good test case.

I will try that and update here soon.

All

Storage vMotion not working with NAS datastore