VMware Cloud Community
buckmaster
Enthusiast
Enthusiast

SVMotion timeout?

Performed 5 SVMotions no problems. However, the last SVMotion was with a VM having 3 disk totaling 250GB in size. It ran all night as expected and in the morning I noticed it was 88% and climbing then VC client produced an error "A general system error occurred: failed to reparent/commit disk(s) (vim.fault.Timedout)". I immediately looked at the source and destination folders and noticed DMotion delta disk. Within 5 -10 minutes these same DMotion delta disk were gone? The VM was powered on and had it's disk reregistered to the new location. The source folder still had the 3 original disk instead of them getting deleted like normal. It seems like everything is OK but I don't have a confident feeling about it. The VM is running and the event logs in the VM look clean. I built a bogus VM and attached the original disk since they were still available, disconnected the nic, powered on and the disk look identical. Was this simply a bogus message from VC? What happened to the DMotion disk? Where they committed to the new disk in their new location or simply removed. /var/log/vmware/hostd.log looks clean with no blaring messages. I'm not sure what to expect when SVMotion encounters an error - what rollback features are built in?

Thanks for your input.

Tom Miller
Reply
0 Kudos
7 Replies
mike_laspina
Champion
Champion

Hello,

I have done about 15 SVMotions in the lab. I the first thing I noticed is that the % complete always stops @ 88% and then transitions the VM control over. The RCLI client controls the process and is waiting for a completion state before deleting the original configuration instance. It looks the the RCLI did not get this last update due to a timeout and left the original intact.

The last part of the process is the VMotion memory transfer and your network response behavior could be the root cause

Is VMotion on a separate network?

Was this a same server disk only SVM or was it a separate ESX to ESX server SVM?

http://blog.laspina.ca/ vExpert 2009
Reply
0 Kudos
buckmaster
Enthusiast
Enthusiast

Since there is two ESX servers VMotion is NIC to NIC 1000GB, no switch, on a private network. The source VMFS is on local storage and the dest VMFS is on a fiber shared datastore with the VM staying registered on the original ESX server. We are using SVMotion to move all local VMFS datastores to shared datastores so the client can take advantage of all the automation built around shared datastores. When I encountered the timeout there were DMotion disk but within 5-10 minutes they were gone? Hard to tell exactly what happened with them. Did they get committed successfully or removed? Seems like they were committed but I can't prove that if challenged.

Thanks for the response.

Tom Miller
Reply
0 Kudos
Bri1
Contributor
Contributor

I also recieved "A general system error occurred: failed to reparent/commit disk(s) (vim.fault.Timedout)" when migrating a 400GB VM from one SAN to another. It appeared to occur at approximately 90% (according to the CLI progress bar) and after about 5 hours. After this, all of the VMs files, including the VMDKs were on the new SAN. Additionally, the VM's settings all pointed to the new folder on the new SAN. There were still VMDKs on the old SAN (no config or log files though), but with modification dates from the time that the SVMotion began.

I agonized over the possible state of my VM, but everything appeared to be working properly. The disks checked clean from the guest, there were no OS or application errors, and no one experienced any loss of connectivity during the entire process. Furthurmore, all changes to the disks that occured during the SVMotion were there.

I rebooted the box and ran my tests again. (Again, absolutly no errors whatsoever.) I then VMotion'd the VM to a host that didn't have access to the old SAN and ran my tests again. (Same thing, no errors.) I'm going to keep the old VMDKs around for a few days, but I as far as I can tell, the SVMotion was a success.

Reply
0 Kudos
buckmaster
Enthusiast
Enthusiast

Thanks for the reply. We went about trouble shooting the same way and seems like everything was OK. Like you, I held onto my orphaned vmdk files for a few days. Sure would be nice to know why this occurs?

Tom Miller
Reply
0 Kudos
cxo
Contributor
Contributor

Just to put some more similiar posts, at a newer time. Same thing happened with one of my VMs (65GB total - ESX 3.5u1 & VC 2.5u1).

As with others, it appears all looks fine and will ultimately remove the "old" VMDKs when confortable.

Are there any answers yet as to why this would be?

Reply
0 Kudos
rreynol
Enthusiast
Enthusiast

We have had this issue as well, usually when trying to migrate a VM that has over 5 disks attached. The steps below describe a strategy to use with the Remote CLI in non-interactive mode. According to what I have researched you cannot do this with existing plugins or Remote CLI in interactive mode. The good news is that the leftover vmdk files and DMotion files are usually not needed. The timeout occurs after the successful migration but before the cleanup. If you do get a timeout it is good to wait a bit before doing any manual clean-up since you might find that the DMotion files will get cleaned up, even after the error. I have a case open with VMware on this and will post back if they have any insights.

I have tried the steps below and it is still rough going with timeout errors and swap errors but the VM did not crash.

From the release notes for VC 2.5 update 2 and update3:

http://www.vmware.com/support/vi3/doc/vi3_vc25u3_rel_notes.html

Storage VMotion of a Large Number of Virtual Disks Fails

Migrating a large number of virtual disks at the same time with Storage VMotion might fail with the following error message:

Received an error from the server: A general system error occurred: failed to reparent/commit disk(s) (vim.fault.Timedout)

Workaround: To migrate a virtual machine with a large number of virtual disks, migrate the disks in batches as follows:

  1. Migrate the virtual machine configuration file and a subset of the virtual disks (no more than five at a time) from the source location to the destination.

  2. Migrate the virtual machine configuration file back to the source location.

  3. Repeat steps 1 and 2 until the virtual machine configuration file and all the virtual machine disks have been migrated to the destination.

Reply
0 Kudos
Dycell
Contributor
Contributor

i Would just like to confirm i had this "error" to after moving a 100GB VM.

I am currently upgrading to a new NetApp SAN. I am using ESX 3.5 hosts.

No noticable problems just the original VMDK files are still present on the old SAN.

According to the VM the VMDK files it uses are on the new SAN.

Reply
0 Kudos