VMware Cloud Community
santosh42
Enthusiast
Enthusiast

Storage vMotion not working with NAS datastore

HI,

I have a powered on virtual machine which is deployed on NFS datastore.

Now, i want to move that VM to the local disk of the ESX server.

I tried to migrate the datastore from the NFS datastore to the local disk.

When i trigger the migrate from NAS datastore to local disk, it starts but when it reaches 18%, it hangs there and after a long time, i  get a generic error message." General error occured. source detected that detsination failed to resume"

I tried serveral times but with no hope Smiley Sad

Also during the storage vmotion the NFS datastore is disabled and grayed out. please see the attachment

Reply
0 Kudos
15 Replies
DSTAVERT
Immortal
Immortal

You need to go through logs on ESXi and the NFS sotorage device to see what is causing the outage. What is the NFS device?

-- David -- VMware Communities Moderator
Reply
0 Kudos
vernon_wells
Contributor
Contributor

Have you checked your exports for the storage?   If the destination host cant write to the NFS share it can cause a hickup like that.

Reply
0 Kudos
dreamworkit
Contributor
Contributor

It's entirely possible that you are flooding your pipeline with too much data for a movement like that. If the NAS isn't fast enough, you could be looking at the Host basically going, "I can't read because the disk is too busy." hence the offline look. Does it look like that after it fails, or does it look like the embedded DELL storage located above it?

Reply
0 Kudos
santosh42
Enthusiast
Enthusiast

The NFS device is another virtual machine.

Since it is a test environment, i have created a virtual machine and installed FreeNAS image from freeBSD and using that VM as a NAS device.

Reply
0 Kudos
santosh42
Enthusiast
Enthusiast

I agree that it might happen if the pipeline is flooded with too much data.

But the Vm that is part of storage is not more than 5 GB size.

Reply
0 Kudos
krishnaprasad
Hot Shot
Hot Shot

Refer VMware KB http://kb.vmware.com/kb/1010045

Reply
0 Kudos
santosh42
Enthusiast
Enthusiast

Thanks Krishnaprasad.

I refered that KB before posting here.

I also increased the timeout for fsr.maxSwitchoverSeconds from 100 to 200 but that dint help.

Reply
0 Kudos
krishnaprasad
Hot Shot
Hot Shot

Could you check if NFS share is mounted to both ESX servers with the same VMFS UUID? i.e. check /vmfs/volumes/<Generated ID for NFS share>

Is this same across both the ESX hosts where you perform storage vmotion?

If this is not same, Remove the storage and try adding it again to both the hosts.

If this is same, can you check the vmkernel logs to see if any specific error message is logged during the failure?

Reply
0 Kudos
santosh42
Enthusiast
Enthusiast

krishnaprasad wrote:

Could you check if NFS share is mounted to both ESX servers with the same VMFS UUID? i.e. check /vmfs/volumes/<Generated ID for NFS share>

Is this same across both the ESX hosts where you perform storage vmotion?

If this is not same, Remove the storage and try adding it again to both the hosts.

If this is same, can you check the vmkernel logs to see if any specific error message is logged during the failure?

I am trying to migrate the VM between the datastores within the same ESX. (not that i am trying to migrate the VM between hosts)

In ESX, one datastore is the local Dell disk with VMFS type and other datastore is NAS with NFS type.

Currently the VM is on NAS device and that i am trying to move to the Host's local disk.

I could capture entries from the vmkernel log :

Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)WARNING: NFS: 1830: Failed to get attributes (No connection)
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)FSS: 735: Failed to get object b00f 36 ee6e510d 4527391d 4d74ef41 685658ef c 2 2caa37d 0 0 0 0 0 :No connection
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)WARNING: NFS: 1830: Failed to get attributes (No connection)
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.781 cpu7:4867)FSS: 735: Failed to get object b00f 36 ee6e510d 4527391d 4d74ef41 685658ef c 2 2caa37d 0 0 0 0 0 :No connection
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.793 cpu7:4867)WARNING: Migrate: 4249: 1299568065122625 S: Migration considered a failure by the VMX.  It is most likely a timeout, but check the VMX log for the true error.
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.798 cpu4:4873)Sched: vm 4874: 1246: name='vmm0:fly'
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.798 cpu4:4873)CpuSched: vm 4874: 9952: zombified unscheduled world: runState=NEW
Mar  8 12:53:07 blr-nlayers-138 vmkernel: 0:20:00:23.798 cpu4:4873)World: vm 4874: 4018: deathPending set; world not running, scheduling reap
Mar  8 12:53:25 blr-nlayers-138 vmkernel: 0:20:00:41.491 cpu6:4381)NFSLock: 540: Start accessing fd 0x410002f5c9a8 again
Mar  8 12:53:25 blr-nlayers-138 vmkernel: 0:20:00:41.491 cpu6:4381)NFSLock: 540: Start accessing fd 0x410002f46088 again
Mar  8 12:53:26 blr-nlayers-138 vmkernel: 0:20:00:42.480 cpu6:4340)NFS: 287: Restored connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")
Mar  8 12:53:58 blr-nlayers-138 vmkernel: 0:20:01:14.864 cpu5:4106)WARNING: NFS: 278: Lost connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")
Mar  8 12:54:08 blr-nlayers-138 vmkernel: 0:20:01:24.869 cpu5:4106)NFSLock: 579: Stop accessing fd 0x410002f46088  4
Mar  8 12:54:08 blr-nlayers-138 vmkernel: 0:20:01:24.869 cpu5:4106)NFSLock: 579: Stop accessing fd 0x410002f5c9a8  4
Mar  8 12:54:43 blr-nlayers-138 vmkernel: 0:20:01:59.827 cpu7:4111)World: vm 4879: 1534: Starting world vmkping with flags 4
Mar  8 12:54:51 blr-nlayers-138 vmkernel: 0:20:02:07.613 cpu7:4110)World: vm 4880: 1534: Starting world vmkping with flags 4
Mar  8 12:55:26 blr-nlayers-138 vmkernel: 0:20:02:42.872 cpu5:4170)BC: 3837: Failed to flush 2 buffers of size 8192 each for object 'vmware.log' b00f 36 ee6e510d 4527391d 4d74ef41 685658ef c 56404 c626c0fe 0 100000000 8181f9d000000001 417f bde07a8000000000: No connection

Thanks.

Reply
0 Kudos
krishnaprasad
Hot Shot
Hot Shot

Are you able to browse the NAS datastore from VI client or from the CLI? do you see any timeouts to the NFS share while trying to access it manually?

Reply
0 Kudos
bulletprooffool
Champion
Champion

If you are unable to vMotion it and you can afford the downtime, then follow this process.

  • shut down your VM
  • right click the VM and select 'remove from inventory'
  • Using SSH / WinSCP / FastSCP or similiar, connect to yuor host and browse to /vmfs/volume/<datastore>
  • Move (or copy if you prefer) the full folder containing all the VM's files to the new location
  • Right click the .vmx file oin the new location and select 'import'
  • Follow the wizard
  • If you are happy with the result, delet the old files form the old datastore (datastore browser will do the job)

I know it is a workaround, rather than a fix, but as the NFS is a local VM, you may be having some sort of weird networking type issues or disk load problems similar

Good luck

One day I will virtualise myself . . .
krishnaprasad
Hot Shot
Hot Shot

Also if you see the logs that you have posted,

Mar  8 12:53:26 blr-nlayers-138 vmkernel: 0:20:00:42.480 cpu6:4340)NFS: 287: Restored connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")

Mar  8 12:53:58 blr-nlayers-138 vmkernel: 0:20:01:14.864 cpu5:4106)WARNING: NFS: 278: Lost connection to the server 10.112.163.230 mount point /mnt/nfs, mounted as ee6e510d-4527391d-0000-000000000000 ("NAS")

it tried accessing the NFS share and says 'lost connection'.. Did you see any issues with your NFS share. Worth trying to access this share from some other linux/windows systems to see if the issue is specific to the NAS share that you have?

santosh42
Enthusiast
Enthusiast

Yes, you are right.

there seems to be networking issues while the ESX tries to connect to the NAS datastore.

There is a connectivity loss intermittently. still figuring it out why.

However, the problem is getting strange.

When i provisioned the same NFS datastore to another ESX host, there i can migrate the powered on VMs between the NFS datastore and that ESX's local disk with no issues.

This is happening only on this one ESX server.

I doubt it is mainly because i have the NFS device as a virtual machine running on this same ESX host. so i think there might be some network loop happening while moving the powered on VMs between the datastores on this particular ESX since the NFS storage is provisioned by the same VM running on the same ESX's local storage. looks like it's a paradox.

Reply
0 Kudos
krishnaprasad
Hot Shot
Hot Shot

hmm. That could be possible cause of failure, since you might be trying to move the disk of the VM  where the NFS server is running.

You can assure that by just creating another NFS share on another ESX's VM or any linux host and see storage VMotion works against that NFS share and local disk?

Or else, can you create another disk for the same VM and create the NFS share on that disk? During storage VMotion of the VM, make sure that you select only the first Disk ( OS disk of the VM ) where NFS share is NOT configured? This operation might be successful.

Reply
0 Kudos
santosh42
Enthusiast
Enthusiast

That seems to be a good test case.

I will try that and update here soon.

Reply
0 Kudos