Assume that the network breaks for a certain host and the host isolation response is shut down. After 12 seconds it will do its isolation test and then start to shut down the VMs running on the host.
The other hosts will detect the missing host after 15 seconds and try to start them. However, since the VMs most likely is not yet shutdown the file locks are in place. Let us say that depending on the workload inside the guest it could take everything from 20 seconds to several minutes to do a graceful shutdown. (I know there is a poweroff that will trigger after 5 minutes).
But my question is, how long and how often will the other hosts try to restart the VMs which vmdk files become available one after another?
Duncan Epping describes the restart behavior at http://www.yellow-bricks.com/2010/06/30/how-does-das-maxvmrestartcount-work/
André
Duncan Epping describes the restart behavior at http://www.yellow-bricks.com/2010/06/30/how-does-das-maxvmrestartcount-work/
André
Great. So the last retry time is after 8 minutes if I understand it correct? That will make that all VMs should either have shut down or being powered off (after 5 minutes).
So the last retry time is after 8 minutes ...
No, the last retry will be after 30 minutes.
2 + 4 + 8 + 8 + 8 minutes
André
Thanks, I misunderstood the "T". But that would mean that all VMs should actually be restarted no sooner than 6 minutes, at the retry 2 then? (Assuming a power off will execute for VMs not shutted down after five minutes).
Exactly.
André
it could take up to 30 minutes, or maybe even never start up. In your case it would be after 6 minutes more than likely. There's not much you can do around this to speed things up.
Also documented in our book by the way. Keep checking my website for some news around a new version of the book. Expect news this week!
Duncan (VCDX)
Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive
Duncan wrote:
it could take up to 30 minutes, or maybe even never start up.
In which cases could it actually take 30 minutes, using all settings at default?
I think defaults are there for a reason, for me, I would want to change the host isolation response time. Just having the isolation respoinse set to leave powered on seems like a prudent solution.
See more information on Duncans newest blog entry
http://www.yellow-bricks.com/2011/04/04/das-failuredetection-time-and-the-isolation-response/