Using Lab Manager 3.0.1.
I've posed a couple of the questions below to different support engineers, but have gotten conflicting info from them so I thought I'd try here to see if I can get some answers from real world users or other support engineers.
We've got a new SAN and want to move chained VMs from the old SAN to the new SAN.
Some of these chains have many VMs and many of the chains total more than 100GB in size.
Question 1: Is it advisable to use SSMove for large VMs/chains?
The old SAN datastores are FC and the new SAN datastore will be connected to the ESX hosts with iSCSI.
Question 2: Will the data be moved over the FC and iSCSI connections or through the Service Consoles between two ESX hosts?
We thought about doing a SAN copy to "backup" VM chains to before a SSMove. In the event of data corruption to due to a failed move, we would configure LM to point to the "backed up" VMs on the datastore holding the backup copies.
Question3: Is it possible to repoint LM to copies of VM chains on a different datastore?
Question 4: With LM 3.x SSMove, what happens to partially moved data/VM chains if the process fails midway due to causes such as network/host loss or other timeout issues?
Answers and other recommendations would be great appreciated.
Question 1: Is it advisable to use SSMove for large VMs/chains?
SSMove is the only supported way to move VM chains between Datastores
Question 2: Will the data be moved over the FC and iSCSI connections or through the Service Consoles between two ESX hosts?
It runs the mv command from the service console, so it should do it from the OS level. If the LUNs are presented to one host, it shouldn't go over a network or service console.
Question 3: Is it possible to repoint LM to copies of VM chains on a different datastore?
One chain must reside on a single datastore. this is a requirement of linked clones.
Question 4: With LM 3.x SSMove, what happens to partially moved data/VM chains if the process fails midway due to causes such as network/host loss or other timeout issues?
you'll have a half moved chain. basically, file a support request and we can help with that.
Regards,
Jonathan
B.Sc., RHCT, VMware vExpert 2009
NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
Thanks Jonathan for the quick response. I have a couple of follow up questions that I'm hoping you could answer.
When the SSMove is started, which host in a cluster will handle the mv command? Is a host randomly picked from the cluster?
If we end up with a half moved chain, we possibly could end up with an unusable VM link or even a chain if for example there was a failure when part of a VMDK file was being moved and was interrupted. In that case, it may not be recoverable, even by assistance from Support. Does that should like a likely scenario if there's a failure? Have you heard of something like that happening before?
If we manually copied the VM files to another datastore as a backup first before the SSMove, could we revert to that datastore backup if there is data loss from an SSMove failure? I'm assuming it would take some modifying of the LM database and VMDK headers of the VMs to match the UUID of the new datastore. Would Support be able to assist with those changes and if so, would it be simple or could it be very complex and possibly get ugly?
You shouldn't end up with a half moved VMDK file ever. mv is a cp then rm the source if the cp is successful. then repeat this loop.
the host is determined at random, so you'll need to search to see which one is doing it (or just monitor for higher disk IO).
I haven't personally hit a scenario where a failure resulted in corrupted VMs. I'm not saying it can't happen, just that I haven't seen it.
If we manually copied the VM files to another datastore as a backup first before the SSMove, could we revert to that datastore backup if there is data loss from an SSMove failure? I'm assuming it would take some modifying of the LM database and VMDK headers of the VMs to match the UUID of the new datastore. Would Support be able to assist with those changes and if so, would it be simple or could it be very complex and possibly get ugly?
recovering from a snapshot lun is very similar to using SSmove. if you over simplify this, the data changed location (new UUID) so update the DB and VMDK headers. It's just that we have tools to help recover from this, and is intended to be used when SSMove fails.
Regards,
Jonathan
B.Sc., RHCT, VMware vExpert 2009
NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.
I haven't personally hit a scenario where a failure resulted in corrupted VMs. I'm not saying it can't happen, just that I haven't seen it.
recovering from a snapshot lun is very similar to using SSmove. if you over simplify this, the data changed location (new UUID) so update the DB and VMDK headers. It's just that we have tools to help recover from this, and is intended to be used when SSMove fails.
Thanks Jonathan. I have one more follow up, if you wouldn't mind answering.
For our datastore "backup", we would not be using a SAN snapshot utility. We would simply scp the files or use a SAN copy utility to copy VM data to the backup datastore. Does that change anything from your last comment?
Here's the scenario I'm thinking: In the event that there is VM corruption, we would open a ticket with Support where they will use tools to update the DB and VMDK headers to get us recovered to the "backup datastore" which has the good data and re-established chains. At that point, we can try another SSMove or decide on full export if only critical VMs if necessary. Does this sound like a scenario we could plan for?
>For our datastore "backup", we would not be using a SAN snapshot utility. We would simply scp the files or use a SAN copy utility to copy VM data to the backup datastore. Does that change anything from your last comment?
not really. the data was moved between UUIDs, header updates and LM DB update would be in order. we just have a pretty slick way of doing it when you contact support, instead of updating all of that by hand.
Does this sound like a scenario we could plan for?
yeah, as long as the backup the exact data that was on the LUN before failure. If for whatever reason the backup is even a few bits different in the VMDK file, this could affect an VM that is related to that VMDK. (child nodes).
aside from that it sounds reasonable.
Regards,
Jonathan
B.Sc., RHCT, VMware vExpert 2009
NOTE: If your problem or questions has been resolved, please mark this thread as answered and award points accordingly.