Argyle
Enthusiast
Enthusiast

Storage vmotion fails, stuck with delta files

Jump to solution

I'm migrating VMs from a HP EVA 5000 SAN to an HP EVA 8000 SAN with storage vmotion using the Remote CLI tool. It works for most part but sometimes it fails midway and I'm stuck with:

  • vmdk files created on new LUNs (sometimes just one disk, sometimes all disks)

  • swap and .vmx files moved to new LUNs

  • delta files left on old LUNs

  • VM running but using delta files on old LUNs and swap on new LUNs

I could use some help completing or restarting the process. I was thinking about creating a new snapshot and committing it to get rid of the storage vmotion delta files but the option "Take snapshot" is gayed out in the Virtual Center for the VMs that fail this way.

Error this time showed up after about 10 minutes. Sys disk is 15 GB and data disk is 20 GB so no large files involved. Had same problem with a VM with 100 GB data disk, it failed after 20-30 minutes. The server running Remote CLI is on the SC network.

Writing down some system data and log info below:

Command and error message:

-


C:\Program Files\VMware\VMware VI Remote CLI\bin>svmotion.pl

--server=vc.mydomain.com --username=some_name --password=some_pass --datacenter="My Data Center"

--vm="[Sys-Disk-306] SERVER01/SERVER01.vmx: Sys-Disk-327"

--disks="[Sys-Disk-306] SERVER01/SERVER01.vmdk:Sys-Disk-327, SERVER01/SERVER01.vmdk:Data-Disk-328"

--verbose

Attempting to connect to service url.

Connected to server.

Resolving the input arguments.

Performing Storage VMotion.

Received an error from the server: An error occurred while communicating with the remote host.

C:\Program Files\VMware\VMware VI Remote CLI\bin>

-


Interesting info in the VMs wmare.log:

Jul 17 15:04:40.904: vcpu-0| HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/47500341-572986f8-bf6e-001cc478a0d4/SERVER01//DMotion-scsi0:00_SERVER01.vmdk'

Jul 17 15:04:40.918: vcpu-0| DISKLIB-CHAIN : UpdateContentID: old = 0xc4692175, new = 0x3b52f9e2

Jul 17 15:12:40.901: vmx| vmdbPipe_Streams Couldn't read: OVL_STATUS_EOF

Jul 17 15:12:40.955: vmx| SOCKET 1 client closed connection

-


Currently the old LUNs look like this:

ll /vmfs/volumes/Sys-Disk-306/SERVER01/

-rw------- 1 root root 67141632 Jul 17 15:57 DMotion-scsi0:00_SERVER01-delta.vmdk

-rw------- 1 root root 326 Jul 17 15:04 DMotion-scsi0:00_SERVER01.vmdk

-rw------- 1 root root 16106127360 Jul 17 15:04 SERVER01-flat.vmdk

-rw------- 1 root root 342 Jun 14 13:15 SERVER01.vmdk

ll /vmfs/volumes/Data-Disk-308/SERVER01/

-rw------- 1 root root 16820224 Jul 17 15:28 DMotion-scsi0:01_SERVER01-delta.vmdk

-rw------- 1 root root 326 Jul 17 15:28 DMotion-scsi0:01_SERVER01.vmdk

-rw------- 1 root root 21474836480 Jul 17 14:28 SERVER01-flat.vmdk

-rw------- 1 root root 342 Jun 14 13:21 SERVER01.vmdk

Currently the new LUNs look like this:

ll /vmfs/volumes/Sys-Disk-327/SERVER01/

-rw------- 1 root root 805306368 Jul 17 15:04 SERVER01-a0151293.vswp

-rw------- 1 root root 16106127360 Jul 17 15:12 SERVER01-flat.vmdk

-rw------- 1 root root 8684 Jul 17 15:04 SERVER01.nvram

-rw------- 1 root root 403 Jul 17 15:08 SERVER01.vmdk

-rw------- 1 root root 0 Jul 17 15:03 SERVER01.vmsd

-rwxr-xr-x 1 root root 2789 Jul 17 15:17 SERVER01.vmx

-rw------- 1 root root 265 Jul 17 15:17 SERVER01.vmxf

-rw-rr 1 root root 22802 Jul 17 15:03 vmware-31.log

-rw-rr 1 root root 28840 Jul 17 15:03 vmware-32.log

-rw-rr 1 root root 31444 Jul 17 15:03 vmware-33.log

-rw-rr 1 root root 112306 Jul 17 15:03 vmware-34.log

-rw-rr 1 root root 30702 Jul 17 15:03 vmware-35.log

-rw-rr 1 root root 1479819 Jul 17 15:03 vmware-36.log

-rw-rr 1 root root 48791 Jul 17 15:28 vmware.log

ll /vmfs/volumes/Data-Disk-328/SERVER01/

-rw------- 1 root root 0 Jul 17 15:08 SERVER01.vmdk

-


The .vmx files show that its using vmdk and delta files on old LUNs and swap on new LUN.

Anyone experienced the same thing and is there a safe way to complete or rollback the process?

0 Kudos
1 Solution

Accepted Solutions
BigHug
Enthusiast
Enthusiast

I will suggest you to call Support. It happened to me once. Support will route the case to the storage group. They are pretty good. Basically they will find out the right disk chain. And use vmkfstools to put the delta back to the new vmdk. It's not difficult. But I will not do it myself.

View solution in original post

0 Kudos
13 Replies
dmaster
VMware Employee
VMware Employee

Hi Argyle,

i found for you the following link..

http://forums.virtualizationadmin.com/SVMotion_Plugin/m_21/tm.htm

see the requirements for using SVmotion

0 Kudos
JWVMCS
Contributor
Contributor

Hi Argyle,

I've had the same issue on one of my failures. The config/swap files move but SVMotion fails to move the vmdk's, as I said i only have one failure that has the DMotion.scsi* files on the src datastore(the other failures dont have the extra vmdk's). So, to recover I intend to power of the vm's and cold migrate the disk files and reconfigure the VMX. I can't do this until I get some downtime so if anyone knows of a way to recover in live state (I think not) please let us know.

JW

0 Kudos
Argyle
Enthusiast
Enthusiast

Thanx for the link. We fulfill all the requirements though. The main problem now is how to complete or rollback these storage vmotions that got stuck half way with delta files.

-Virtual machines with snapshots cannot be migrated using Storage VMotion.

There is no shapshot on the machines

-Virtual machine disks must be in persistent mode or be raw device maps.

Persistent mode here

-The host on which the virtual machine is running must have sufficient resources to

support two instances of the virtual machine running concurrently for a brief time.

There is enough resources

-The host on which the virtual machine is running must have a VMotion license,

and be correctly configured for VMotion.

It has license and is configured for vmotion

-The host on which the virtual machine is running must have access to both the

source and target datastores.

It has access to all datastores

-VMware Infrastructure 3 supports a maximum of four simultaneous VMotion or

Storage VMotion accesses to a single datastore.

We only do one storage vmotion at a time.

0 Kudos
dmaster
VMware Employee
VMware Employee

maybe the use of cold migrations is an option for you ? Migrate the machines back to their original datastore and commit or remove the snapshot file.

p.s. if you think answers on the forum are usefull or correct please award them with points.

BigHug
Enthusiast
Enthusiast

I will suggest you to call Support. It happened to me once. Support will route the case to the storage group. They are pretty good. Basically they will find out the right disk chain. And use vmkfstools to put the delta back to the new vmdk. It's not difficult. But I will not do it myself.

View solution in original post

0 Kudos
Argyle
Enthusiast
Enthusiast

dmaster: Was looking into that but the VM is in a midway state somehow. A lot of options are greyed out so I can't migrate it back. Also ESX think that no snapshots exist, the vmsd file is blank. It looks like dmotion delta files are a bit different for some reason.

BigHig: Yea a case is opened. Was hoping someone had run into the same thing on the forum and had a nice solution Smiley Happy

A side not is that the main cause seem to be resource starvation of the service console. The problem occurs on one specific ESX host that had the console CPU pegged due to a bad behaving VM. This impacts the storage vmotion process so it terminates.

0 Kudos
Argyle
Enthusiast
Enthusiast

After a lot of testing I found a solution that:

- commits the delta data to the original vmdk files

- resets DMotion state

- keeps the VM online

=======================

DISCLAIMER:

I take no responsibility for the result in your specific environment. The following worked for me.

=======================

Description:

-


Create a snapshot of the VM and then remove the snapshots. This will commit the dmotion deltas too. Note that you can't remove/commit the dmotion delta files directly since they don't count as a normal snapshot. Running vmware-cmd with hassnapshot parameter doesn't return a value of "1".

After that edit the .vmx settings to remove entries for the DMotionParent parameters or it will restart the migration at next reboot. You will have no options like "Edit settings" etc in Virtual Center unless you do this.

Having data left in the DMotionParent parameters will result in only having one option available in Virtual Center GUI at next reboot called "Complete migration".

You do not want to use this option though since it completes the migration on the source LUNs, ignoring your previous destination LUNs. You need double the space to do this if you still want to perform it. In my case I had a 100 GB VM disk on a 150 GB LUN and it will fail.

Once you remove the value in the DMotionParent parameters via vmware-cmd in ESX, Virtual Center will display all normal options again. Note that editing the .vmx file directly will not trigger a reload of the .vmx config.

Step by step:

-


- You have a VM with two disks on LUN1 and LUN2 that you want to migrate to LUN3 and LUN4

- Storage vmotion fails midway for reason X

- We have a running VM with .vmx and swap on LUN3 and vmdk and dmotion delta files on LUN1 and LUN2. VM is running on the delta files.

- Log in to the ESX that has the VM to create a snapshot on VM (its not available via Virtual Center GUI in this state), make sure there is room on LUN3 that holds vmx files.

- Find the UUID path to your VM with:

vmware-cmd -l

- Create a snapshot (of all disks) with:

vmware-cmd /vmfs/volumes/487...d4/MYSERVER/MYSERVER.vmx createsnapshot snapname snapdescrition 1 1

You get files like this on LUN3:

DMotion-scsi0:00_MYSERVER-000001.vmdk

DMotion-scsi0:00_MYSERVER-000001-delta.vmdk

DMotion-scsi0:01_MYSERVER-000001.vmdk

DMotion-scsi0:01_MYSERVER-000001-delta.vmdk

- Remove (Commit) the snapshots:

vmware-cmd /vmfs/volumes/487...d4/MYSERVER/MYSERVER.vmx removesnapshots

The above commit all delta files, including dmotion files.

- We have a server running on the original disks with all data intact.

Virtual Center still thinks its in dmotion state so you can't edit settings, perform vmotion or anything via Virtual Center.

To fix we need to clear the DMotionParent parameters in the .vmx file with the following command from ESX:

vmware-cmd /vmfs/volumes/487...d4/MYSERVER/MYSERVER.vmx setconfig scsi0:0.DMotionParent ""

vmware-cmd /vmfs/volumes/487...d4/MYSERVER/MYSERVER.vmx setconfig scsi0:1.DMotionParent ""

If we do not do this the only option after VM shut down will be "Complete Migration" in Virtual Center. If you select this option it will try to rerun storage vmotion again (offline) but it will use same destination as the source disks. Not good if we don't have space on those LUNs.

- We still have vmx and swap on LUN3 and vmdk files on LUN1 and LUN2.

- Perform a new storage migration to move back the vmx files.

Example of just moving vmx file

-


C:\Program Files\VMware\VMware VI Remote CLI\bin>svmotion.pl

--server=vc.mydomain.com --username=some_name --password=some_pass

--datacenter="My Data Center"

--vm="[Sys-Disk-002] MYSERVER/MYSERVER.vmx:Sys-Disk-001"

--disks="[Sys-Disk-001] MYSERVER/MYSERVER.vmdk:Sys-Disk-001, MYSERVER/MYSERVER.vmdk:Data-Disk-001"

--verbose

Attempting to connect to service url.

Connected to server.

Resolving the input arguments.

Performing Storage VMotion.

Storage VMotion completed successfully.

Disconnecting.

-


- Clean up previous destination LUN3 and LUN4 by removing any vmdk files and folders that was created.

- Done. We are back where we started with no downtime.

- We can now try storage vmotion again.

0 Kudos
JWVMCS
Contributor
Contributor

Great detail Magnus! most Useful !

0 Kudos
hennish
Hot Shot
Hot Shot

Worked for me too. Thanks a lot! I was kinda nervous about that hung svmotion. Smiley Happy

0 Kudos
ebowser
Contributor
Contributor

Hey Argyle - great post and excellent info, thanks much.

I just had this happen to a VM this morning. I got the delta files merged back in, and now it's running as split - vmx & vmswap on one datastore, vmdk on the other. I have modified the DMotionParent line in the vmx, and reloaded the config but all options in VI Client are still greyed out. I even tried restarting management services on the ESX host but still no luck.

Any other ideas?

Thanks again!

0 Kudos
Argyle
Enthusiast
Enthusiast

Did you modify the DMotionParent info via the commandl line tool vmware-cmd? If you do it manually with a editor it won't trigger a reload in virtual center of the config.

Example:

vmware-cmd /vmfs/volumes/487...d4/MYSERVER/MYSERVER.vmx setconfig scsi0:0.DMotionParent ""

vmware-cmd /vmfs/volumes/487...d4/MYSERVER/MYSERVER.vmx setconfig scsi0:1.DMotionParent ""

0 Kudos
ebowser
Contributor
Contributor

Hiya Argyle.

We're running ESXi, not ESX, so I don't have the vmware-cmd command. I did run "vim-cmd vmsvc/reload <vmid>" after making the edits, to no avail. I then actually moved my "/etc/vmware/hostd/vmInventory.xml", restarted management, replaced the file and restarted management, still to no avail.

Thanks,

Eric

0 Kudos
diederikm
Contributor
Contributor

Thanks very much for your detailed explanation it helped us a lot!

0 Kudos