Solved: Re: vmotion performance and what to expect

td3201 · ‎08-20-2008

In the scenario that the power is pulled on one of my vmware esx hosts how does vmotion move the machines to a free host? Does it move them all at once? Is putting a host into maintenance host exactly the same as a host going down, at least from a vmotion perspective?

wpatton · ‎08-20-2008

To be clear, HA does not use VMotion. HA is used in a scenario where the ESX Host running Guests has gone "offline" (i.e. Power Loss, Hardware Failure, Disaster, etc). Once VC detects the loss of this ESX Host, it will choose another Host in the Cluster with enough available resources and map the Guest OS to that new ESX Host and automatically Power On that system. This is a disruptive process that will result in an outage.

VMotion is used by Maintenance Mode, DRS, or manual User invervention. This is a non-disruptive process, if done in High Priority, and will result in no outage. Each scenario reacts differently, with Maintenance Mode all Guests will instantly be VMotion'd off the Host that is being placed in Maintenance Mode. With DRS, VC will evaluate ESX Host resources and distribute Guest system accordingly across the Cluster. User intervention, well, you get to choose what you fix or screw up in this scenario. :smileygrin:

VMotion performance is actually dependant more on network and Host resources than anything, as it is not moving the actual VMDK files but just the Memory State of the Guest from one Host to another. This is something that will need to be tested per Host and per Guest. High transaction Guests on Hosts with High Allocation and Guest Load will obviously take longer to VMotion than the reverse, but either situation will be without an outage to Clients or the Guest OS.

Loss of Backend Storage is a totally different situation, if the storage that contains the .vmx/.vmdk files goes offline, obviously nothing can run those systems, just like if you pull all the drives out of a physical server.

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

*Disclaimer: VMware Employee* If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

View solution in original post

Troy_Clavell · ‎08-20-2008

if a HOST goes down HA will kick in, placing the VM's on another host within the cluster. DRS will load balance the cluster from there.

By default there is a 15 second isolation response before HA kicks in and considers a host isolated.

Putting a host in mainteance mode is not the same because the host on-line, just not available in the DRS pool. If you have DRS set to fully automated and put a HOST into maintance mode, all VM's will migrate off to the remaining hosts in the cluster.

Hope this helps.

td3201 · ‎08-20-2008

That helps a lot.

-Does it attempt to vmotion all of the servers at once?

-How long does it typically take to vmotion in this scenario?

-I discussed a power loss. What about other host issues? Loss of backend storage, for example.

Troy_Clavell · ‎08-20-2008

That helps a lot.
-Does it attempt to vmotion all of the servers at once?

Yes, DRS will migrate all the VM's, but not at once. It will queue them and migrate a couple at a time until all have been migrated

-How long does it typically take to vmotion in this scenario?

It all depends on how many VM's you have on the host you are testing, as well as network speed.

-I discussed a power loss. What about other host issues? Loss of backend storage, for example.

If a host loses back end storage, in a SAN enviornment, you will have more problems then just one host. You will lose a lot of VM's. That is a completly seperate issue.

HA only monitors host uptime/downtime. If there are other Host issues (ie hardware problems) you should use monitoring tools. We use SIM.

hope this helps.

Here are a couple links

http://www.vmware.com/files/pdf/VMwareHA_twp.pdf

http://www.vmware.com/files/pdf/drs_performance_best_practices_wp.pdf

http://www.vmware.com/pdf/vmware_drs_wp.pdf

wpatton · ‎08-20-2008

To be clear, HA does not use VMotion. HA is used in a scenario where the ESX Host running Guests has gone "offline" (i.e. Power Loss, Hardware Failure, Disaster, etc). Once VC detects the loss of this ESX Host, it will choose another Host in the Cluster with enough available resources and map the Guest OS to that new ESX Host and automatically Power On that system. This is a disruptive process that will result in an outage.

VMotion is used by Maintenance Mode, DRS, or manual User invervention. This is a non-disruptive process, if done in High Priority, and will result in no outage. Each scenario reacts differently, with Maintenance Mode all Guests will instantly be VMotion'd off the Host that is being placed in Maintenance Mode. With DRS, VC will evaluate ESX Host resources and distribute Guest system accordingly across the Cluster. User intervention, well, you get to choose what you fix or screw up in this scenario. :smileygrin:

VMotion performance is actually dependant more on network and Host resources than anything, as it is not moving the actual VMDK files but just the Memory State of the Guest from one Host to another. This is something that will need to be tested per Host and per Guest. High transaction Guests on Hosts with High Allocation and Guest Load will obviously take longer to VMotion than the reverse, but either situation will be without an outage to Clients or the Guest OS.

Loss of Backend Storage is a totally different situation, if the storage that contains the .vmx/.vmdk files goes offline, obviously nothing can run those systems, just like if you pull all the drives out of a physical server.

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

*Disclaimer: VMware Employee* If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

td3201 · ‎08-20-2008

Great comments. These have helped a lot. I have a few more questions.

You mentioned that memory and memory dictate how fast and how many VMs can be migrated at once?

Is this configurable?

Any formula to tell how many can be migrated at once by VC in a HA scenario?

Thanks!

wpatton · ‎08-20-2008

Network is configurable by the amount of bandwidth you offer to each Host, the more bandwidth, the better.

Memory is configurable by the amount of non-allocated memory the system has, if each Host has enough memory to easily accomodate any incoming Guest from another Host, you will get yourself into much less trouble. Same can be said for CPU resources as well, really.

I don't really have any formula or measuring stick to provide for you, but I would suggest test it throughly before deploying to production. I have had emergency situations where I have moved 20-30 Guests among multiple Hosts and found that if it can't take any more transactions it will either slow the VI client waiting for the ESX Host to respond or it will simply accept and queue the migration.

An HA scenario is different in that the ESX Host is lost and so is the Memory State of that Guest, so nothing is be VMotion'd; it is simply pointing another Host to the .vmx, loading that into it's Inventory, and Power On. At that point, it is a very light operation, the pain it will feel is all the Guest OS boot up transactions. This can definately cause some slowdowns when you increase the resource usage on the new Host.

If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

*Disclaimer: VMware Employee* If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

td3201 · ‎08-20-2008

I am not so concerned about the timing on vmotion functions. I am for HA though. So, after 15 seconds of a host going down, HA kicks in and remaps the VMs to other available nodes and powers them all on at once? I understand there is some lag because all these VMs will start booting and slow the host down. I am just curious as to how long it takes between failure and when I can expect the VMs to start booting.

Kahonu · ‎08-20-2008

Aloha - the attached spreadsheet can help you determine how many simultaneous host outages your cluster can withstand and keep on ticking.

Bill