VMware Cloud Community
dontinou
Contributor
Contributor
Jump to solution

Storage vMotion Failed - Help!!!

As the title indicates, I've tried unsuccesfully to perform storage vmotion to a vm (Right click > Migrate Storage) from local SAS to a remote NFS share. I was able to do it fine for 12 other VMs, but the last one (wouldn't you know it) is 600GB large, and failed saying the "Operation timed out" in VC. I know VC says this normally if things take too long, however all disk/copy activity has benn stopped for hours, and when I look in both my data stores I have files in BOTH places. My VM is still up, and I am afraid to turn it off in case it doesn't come back. Here are the details:

In my old file store, local SAS (2 files):

vault.vmdk - 630GB

DMotion-scsi0:00_vault.vmdk - 70MB

In my new file store, NFS (14 files):

vault.vmx, vault.vmsd, vault.nvram, vmware.log, vmware10-15.log, vault.vmxf, vault-Snapshot1.vmsn, vault-a99f9a04.vswp (1GB), DMotion-scsi0:00_vault-000001.vmdk (60MB)

In VC, my VM is showing as running from my NFS file store (Hard Disk 1 points to vault/DMotion-scsi0:00_vault-000001.vmdk), even though it looks like the bulk of my data is still in the local SAS filestore.

So, how do I put humpty dumpty back together again, all on the new NFS filestore without losing my VM?

The reason I am trying to migrate everything off is because I need to rebuild the array to include more disks

In case it helps, VC reported the following in the Tasks & Events pane:

7/9/2009 6:30:38 AM, Failed to migrate vault from 192.168.2.254 to 192.168.2.254 in Noob

7/8/2009 9:29:08 PM, Migrating vault off host 192.168.2.254 in Noob

7/8/2009 9:28:36 PM, Migrating vault from 192.168.2.254 to 192.168.2.254 in Noob

7/8/2009 9:28:35 PM, Task: Relocate Virtual Machine Storage

Reply
0 Kudos
1 Solution

Accepted Solutions
ThompsG
Virtuoso
Virtuoso
Jump to solution

Evening,

We had exactly the same problem when the mgmt-vmware service was restarted on the ESX server hosting the VM being SVMotioned.

The VMware Knowledge base has an article explaining some steps to recover from this process (KB1009113) however it is a little incomplete in that you are required to power off the VM to complete.

The following is the steps I followed to recover without an outage:

We are making the assumption that you were SVMotioning a VM from LUNA to LUNB and now have the VM running from delta's on LUNA with the VM's configuration files on LUNB.

1. Log in to the Service Console of the ESX server that has the VM.

2. To confirm that the virtual machine does not have any snapshots, run the command:

vmware-cmd <vm-cfg-path> hassnapshot

The output hassnapshot()= confirms that there are no snapshots

3. Create a snapshot with:

vmware-cmd <vm-cfg-path> createsnapshot <name> <description> <quiesce> <memory>

By doing this you get files like this on LUNB:

DMotion-scsiX:XX_server-000001.vmdk

DMotion-scsiX:XX_server-000001-delta.vmdk

4. Remove the snapshots with:

vmware-cmd <vm-cfg-path> removesnapshots

This command commits the DMotion snapshot and the snapshot you created step 3.

5. VirtualCenter will still think the VM is in the SVMotion state so to fix:

vmware-cmd <vm-cfg-path> setconfig scsiX:X.DMotionParent ""

Do this for every VMDK attached to this VM.

Now perform a new SVMotion to move back the VM configuration files to LUNA.

Clean up LUNB and remove any files/folders created by the failed SVMotion. You can now do the SVMotion again.

Of course while the above steps worked for me, please take this as all care but no responsibility. The above steps were borrowed from http://communities.vmware.com/message/999890

Kind regards,

Glen

Message was edited by: ThompsG

View solution in original post

Reply
0 Kudos
7 Replies
marcelo_soares
Champion
Champion
Jump to solution

Open the vault.vmx file and see here where your disk is pointing on. This will give us the first clue on where yout VM are really running.

Marcelo

Marcelo Soares
Reply
0 Kudos
dontinou
Contributor
Contributor
Jump to solution

Excerpt from vault.vmx on NAS :

scsi0:0.fileName = "DMotion-scsi0:00_vault-000001.vmdk"

sched.swap.derivedName = "/vmfs/volumes/8e243447-a4544cc6/vault/vault-a99f9a04.vswp"

Contents of DMotion-scsi0:00_vault-000001.vmdk file on NAS:

##Disk DescriptorFile

version=1

CID=744081a2

parentCID=0a2641b2

createType="vmfsSparse"

parentFileNameHint="/vmfs/volumes/49881cc4-0a4c675b-bf7e-0022197b9d1d/vault//DMotion-scsi0:00_vault.vmdk"

##Extent description

RW 1258291200 VMFSSPARSE "DMotion-scsi0:00_vault-000001-delta.vmdk"

##The Disk Data Base

#DDB

Looks like it references back to the VMFS SAS (it has to anyway, I can still access my 600GB of data, theres only 1 GB used in the new filestore)

What to do, there are vmdk's all over the place! Also, no snapshots or anything on the VM before I started the migration. It WAS running live though.

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso
Jump to solution

Evening,

We had exactly the same problem when the mgmt-vmware service was restarted on the ESX server hosting the VM being SVMotioned.

The VMware Knowledge base has an article explaining some steps to recover from this process (KB1009113) however it is a little incomplete in that you are required to power off the VM to complete.

The following is the steps I followed to recover without an outage:

We are making the assumption that you were SVMotioning a VM from LUNA to LUNB and now have the VM running from delta's on LUNA with the VM's configuration files on LUNB.

1. Log in to the Service Console of the ESX server that has the VM.

2. To confirm that the virtual machine does not have any snapshots, run the command:

vmware-cmd <vm-cfg-path> hassnapshot

The output hassnapshot()= confirms that there are no snapshots

3. Create a snapshot with:

vmware-cmd <vm-cfg-path> createsnapshot <name> <description> <quiesce> <memory>

By doing this you get files like this on LUNB:

DMotion-scsiX:XX_server-000001.vmdk

DMotion-scsiX:XX_server-000001-delta.vmdk

4. Remove the snapshots with:

vmware-cmd <vm-cfg-path> removesnapshots

This command commits the DMotion snapshot and the snapshot you created step 3.

5. VirtualCenter will still think the VM is in the SVMotion state so to fix:

vmware-cmd <vm-cfg-path> setconfig scsiX:X.DMotionParent ""

Do this for every VMDK attached to this VM.

Now perform a new SVMotion to move back the VM configuration files to LUNA.

Clean up LUNB and remove any files/folders created by the failed SVMotion. You can now do the SVMotion again.

Of course while the above steps worked for me, please take this as all care but no responsibility. The above steps were borrowed from http://communities.vmware.com/message/999890

Kind regards,

Glen

Message was edited by: ThompsG

Reply
0 Kudos
dontinou
Contributor
Contributor
Jump to solution

Thank you, a lifesaver most definitely.

In the end, I had to use the vmkfstools -i option to chunk the vmdk over to the new location (as detailed in the KB), even after creating and removing the snapshots the files were still in both locations (no more Dmotion stuff though)

Whew!!

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso
Jump to solution

Excellent! Glad to hear it worked, or rather allowed you to recover from this failure.

Just a question if you don't mind, where you unable to SVMotion the configuration files back after performing the vmware-cmd <vm-cfg-path> removesnapshots and therefore leaving a VMDK on the LUN you were trying to migrate too? In my case after using the removesnapshots command I was left with a VMDK on the new LUN which I deleted and then just SVMotioned again.

Don't be afraid of SVMotion though. In the last week I have moved over 82 virtual machines (~8TB of VMDK's) from one storage array to another and I only had the one failure which was caused by a user error rather than failure of the process. Scary at the time, but over that now. We have another 250+ virtual machines to move over the next two weeks (or so) - about 20TB+

Have a great day,

Glen

Reply
0 Kudos
dontinou
Contributor
Contributor
Jump to solution

It was weird, actually. After I did the removesnapshots I ended up with the vmx and vswap and other config files on the new lun, but the 600GB flat vmdk file was still in the old lun.. so VC thought that the VM was already on the new lun. I suppose I couldve tried SVMotioning the VM back to the old lun and then SVMotioning again to the new one, but I figured I could just copy the VMDK over and change the config files manually.

I was just happy as heck that the removesnapshots consolidated all the DMotion files.

Yes I am much less afraid of SVmotion now, it was scary but at leastI have a better grasp of what is happening in the background now.

Thanks again!

Reply
0 Kudos
sasser1970
Contributor
Contributor
Jump to solution

Thanks !!! Works great for my me but a sVmotion back to old Storage was not available. I must deleted the old Guest and make a new Guest from VMX file. After deleting the Guest a I`ve an orphand entry from old Guest in the ESX. The ESX restart and the vCenter Service restart no solution, but I disconnect the affected ESX, deleted the ESX und bring ESX in the Cluster. No more orphand entry all Configuration like LAN, Storage all is available. THANKS all for VMware Users ! ! ! Smiley Happy

The VMware KB entry for a solution with orphand Guests:

KB1003742

Reply
0 Kudos