VMware Cloud Community
rickardnobel
Champion
Champion
Jump to solution

HA retry time when host isolation?

Assume that the network breaks for a certain host and the host isolation response is shut down. After 12 seconds it will do its isolation test and then start to shut down the VMs running on the host.

The other hosts will detect the missing host after 15 seconds and try to start them. However, since the VMs most likely is not yet shutdown the file locks are in place. Let us say that depending on the workload inside the guest it could take everything from 20 seconds to several minutes to do a graceful shutdown. (I know there is a poweroff that will trigger after 5 minutes).

But my question is, how long and how often will the other hosts try to restart the VMs which vmdk files become available one after another?

My VMware blog: www.rickardnobel.se
0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

0 Kudos
9 Replies
a_p_
Leadership
Leadership
Jump to solution

Duncan Epping describes the restart behavior at http://www.yellow-bricks.com/2010/06/30/how-does-das-maxvmrestartcount-work/

André

0 Kudos
rickardnobel
Champion
Champion
Jump to solution

Great. So the last retry time is after 8 minutes if I understand it correct? That will make that all VMs should either have shut down or being powered off (after 5 minutes).

My VMware blog: www.rickardnobel.se
0 Kudos
a_p_
Leadership
Leadership
Jump to solution

So the last retry time is after 8 minutes ...

No, the last retry will be after 30 minutes.

2 + 4 + 8 + 8 + 8 minutes

André

rickardnobel
Champion
Champion
Jump to solution

Thanks, I misunderstood the "T". But that would mean that all VMs should actually be restarted no sooner than 6 minutes, at the retry 2 then? (Assuming a power off will execute for VMs not shutted down after five minutes).

  • T+0 – Restart
  • T+2 – Restart retry 1
  • T+4 – Restart retry 2
  • My VMware blog: www.rickardnobel.se
    0 Kudos
    a_p_
    Leadership
    Leadership
    Jump to solution

    Exactly.

    André

    0 Kudos
    depping
    Leadership
    Leadership
    Jump to solution

    it could take up to 30 minutes, or maybe even never start up. In your case it would be after 6 minutes more than likely. There's not much you can do around this to speed things up.

    Also documented in our book by the way. Keep checking my website for some news around a new version of the book. Expect news this week!

    Duncan (VCDX)

    Available now on Amazon: vSphere 4.1 HA and DRS technical deepdive

    0 Kudos
    rickardnobel
    Champion
    Champion
    Jump to solution

    Duncan wrote:

    it could take up to 30 minutes, or maybe even never start up.

    In which cases could it actually take 30 minutes, using all settings at default?

    My VMware blog: www.rickardnobel.se
    0 Kudos
    Troy_Clavell
    Immortal
    Immortal
    Jump to solution

    I think defaults are there for a reason, for me, I would want to change the host isolation response time.  Just having the isolation respoinse set to leave powered on seems like a prudent solution.

    See more information on Duncans newest blog entry

    http://www.yellow-bricks.com/2011/04/04/das-failuredetection-time-and-the-isolation-response/

    depping
    Leadership
    Leadership
    Jump to solution

    using the default setting indeed

    Duncan

    HA/DRS technical deepdive - the ebook!

    0 Kudos