VMware Cloud Community
StephenMoll
Expert
Expert

Inter Group Delay in HA Orchestrated Restart

I am looking into the use of HA Orchestrated Restart in the situation where the startup order of some VMs is important. However although I understand how to set the feature up, I cannot find a satisfactory setting for one of the requirements I have to meet:

The specification is that hosted systems (a collection of VMs) will specify a priority level for every VM in the system, starting at 1.

If they are all set to "1", then I can start all the VMs simultaneously straight away.

If one or more VMs has a different priority number, then each will also have a delay time. This is an estimate for the amount of time the current VM will take to start up, and VMs in the next priority level will wait this amount of time before they are started. This is very reminiscent of the "VM Startup/Shutdown" settings for a standalone host. Only with HA Orchestarted Restart, there isn't an obvious way of specifying a delay to be added to each step in the sequence. I presume the default is that Orchestrated HA waits for VMTools to be ready on every VM in the current VM group before moving onto the next.

I though of perhaps using VM Options > Boot Options > Boot Delay, but this only permits values up to 10 seconds (10,000ms), whereas I know some of the hosted systems are specifying delays of up to 5 minutes.

Any ideas?

Tags (2)
Reply
0 Kudos
6 Replies
daphnissov
Immortal
Immortal

Hi, Stephen. I wrote a blog about this functionality back in November. I think the additional delay feature would probably work in this situation, and that can be customized to be a timer after the VM is powered on or waiting for tools. Have a read and test this out to see if it'll work for you.

Reply
0 Kudos
StephenMoll
Expert
Expert

I am beginning to think we can't use it. Not only is there a potential issue with not being able to define a specific delay period between steps, but also there is some concern about what happens when you have VM with start up dependencies on two or more hosts. When one of those hosts dies, how is the startup pro of the VM on the failed host processed. If a VM on an unaffected host has a startup dependency on a VM that was 'killed' by the host failing, is the dependent VM restarted? I think it won't be, so in situations where the start-up order is quite strict (and we have some unfortunately) this would be a non-starter for us. So we would have to develop some management code to root out surviving VMs from the start-up dependency list and shut them down or kill them before implementing the full start-up sequence. If we have to do that, we may as well handle all VM recovery situations, leaving HA only with ensuring VMs on isolated hosts being shutdown or killed.

Reply
0 Kudos
StephenMoll
Expert
Expert

Thanks Chip, that is helpful.

Reply
0 Kudos
StephenMoll
Expert
Expert

If I have three VMs that must be started in a certain order in order for the application to work, say :

VM1 -> VM-2 -> VM-3

If for example 1 and 3 are on HOST-1 and VM-2 is on HOST-2, and HOST-1 fails: I presume that the VMs lost in the failure (VM-1 and VM-3) will be restarted VM-1 then VM-3. Since VM-2 was not on the failed host, it will not be affected at all, and HA cannot be used to restart VM-2 to maintain the correct start order, yes?

Reply
0 Kudos
daphnissov
Immortal
Immortal

Yep, that's right. And this is going to be the ultimate problem with your highly-dependent VMs in that you cannot control which host fails and which VMs are effected (unless you know this via DRS rules). So, therefore, only VMs which were running on the failed host at the time of said failure are restarted. I suppose you *could* do some scripting trickery where you watch for this in maybe something like Log Insight and then kick off an action that reboots that second VM.

Reply
0 Kudos
StephenMoll
Expert
Expert

We are already developing a management system application which automates a lot of what we need. This seems to be dumping ground for functionality to cover off areas where vSphere doesn't quite do what we need to meet our requirements. This being one example, alongside functionality to reign in some of the freedoms enjoyed by DRS and HA.

I'm guessing that my query would have the same answer if VM-1, VM-2 and VM-3 were in a vApp?

Reply
0 Kudos