VMware Cloud Community
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Autostart of VMs after total cluster outage

Hi,

 

We have a three node cluster managed by a single vCenter appliance running within that cluster.

I am trying to account for a situation where we have a total power loss causing all three nodes to go off at the same time. In this situation, the hosts themselves will restore their last known power state without issue, but I need to find a way to autostart VMs.

 

Vmware support advised this can only be done on a host by host basis, and requires vSphere HA and DRS to be disabled on the cluster. I cannot believe this to be the case for ESXi/vCenter as enterprise products, and that we are forced to choose between manual VM management with autostart, or auto management and no auto start!

The only thing I can currently think to do is create affinity rules to keep VMs on specific hosts, and then set that host to auto start those VMs, but then if the VM migrates to another host for maintenance and then back again, we have to reconfigure the auto start rules.

Can anyone suggest a better way to achieve an auto recovery setup in this scenario?

 

We are moving away from Hyper-V in favour of ESXi, but even Hyper-V using MFCS can autostart all VMs regardless of their host! I would have thought vCenter could have central autostart management, and just configure autostart policies for me as it vMotions VMs between hosts!

 

Many thanks

James

0 Kudos
1 Solution

Accepted Solutions
depping
Leadership
Leadership
Jump to solution

Actually, when the hosts come back, HA should restart the VMs in that case. The VMs in that scenario were last registered as "powered on", and as a result, when a host returns for duty, it should be able to power VMs on based on that info.

View solution in original post

0 Kudos
25 Replies
sjesse
Leadership
Leadership
Jump to solution

Your idea for autostarting them seems ok, but why are you so concerned about an auto-restart? If you plan your environment correctly, anything outside of a complete power outage at the location shouldn't bring down a 3 node cluster.  If that's the case, and you have external storage auto restarting the vms isn't going to help since you need to make sure that and any other required systems are turned on as well. As long as you have 1 host up, VMware ha should restart anything from a failed host, so maybe an idea is to pin that to a host and set that to autostart, Most VMware environments are designed to never go down, and most DR plans are setup to recover the vms at a secondary location preferably until the issues are fixed.

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Hi,

 

We are not a large company, and historically have had limited investment in IT infrastructure. I have only been here since April, and we have already had 3 or 4 major power outages that have lasted longer than our UPS runtimes cover. These have occurred for a variety of different reasons, and often at times I am unavailable. This is why I am so concerned by total outages and auto restarts!

The SAN and storage switch will power themselves on automatically similarly to the hosts, and will be quicker to restore than the hosts themselves. Therefore the storage infrastructure should be available before ESXi needs to access it.

 

We don't currently have any additional locations or off site recovery alternatives, again due to our current size and budget. We want to investigate cloud infrastructure at some point in the future, but we aren't there yet.

 

So if vCenter is set to be on host 1 for example as an affinity rule and is set to auto start, will this then auto start any other VMs that were powered on at the time of the outage?

If that's the case, it isn't guaranteed to do the job 100% of the time, but should be good for 99% of situations.

 

Thanks

James

0 Kudos
sjesse
Leadership
Leadership
Jump to solution

I think your best bet is probably powercli, and you could even be clever and have an automation vm start after the vcenter appliance does and have them both auto start on the same host. That way on the automation vm you can run on PowerShell script on startup that will get all the vms from  vcenter and start them, or you can be more granular and script them to start them in an order you need. I do something similar with my lab so I can shut it down and start it as needed.

Tags (1)
JamesAspall
Enthusiast
Enthusiast
Jump to solution

So does that mean that vCenter and HA alone will not auto start VMs that were running at the time of the outage?

0 Kudos
sjesse
Leadership
Leadership
Jump to solution

Not in a complete outage though, you would need one host running, for ha to kick in.

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Frustrating.

I'll have to find a way to sort this via PowerCLI as you suggested then.

0 Kudos
depping
Leadership
Leadership
Jump to solution

So there are two things you need to remember:

1) vSphere HA will restart VMs when a host, or multiple hosts have failed

2) vSphere HA will restart VMs only when they are not powered off by cleanly

What does this mean? Well if you power off the VMs and the hosts because your UPS is running out of juice, then this is considered a CLEAN poweroff, which means that you as the admin probably had a reason for powering it off, as a result HA will not power it on.

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Hi,

 

Yes I understand that.

 

I'm trying to specifically account for the scenario whereby all hosts fail because of an outage our UPS' don't have sufficient runtime to cover (currently about 10 to 20 minutes).

If this happens in the middle of the night and all three hosts fail because no one is available to log on and safely shutdown the cluster, HA is irrelevant if it won't power on all VMs in this scenario.

 

Thanks

0 Kudos
depping
Leadership
Leadership
Jump to solution

Actually, when the hosts come back, HA should restart the VMs in that case. The VMs in that scenario were last registered as "powered on", and as a result, when a host returns for duty, it should be able to power VMs on based on that info.

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Ah good, this is what I was expecting to be honest. I did find it hard to believe ESXi/vCenter couldn't really do anything to accommodate this scenario.

I have reopened my support ticket to clarify the nature of the cluster shutdown/failure, and will get them to confirm the behaviour also.

 

Thanks.

0 Kudos
depping
Leadership
Leadership
Jump to solution

Sure, feel free. Just a note, I work for VMware, and wrote a book on HA and DRS, you can download it freely here: https://www.rubrik.com/resources/white-papers/19/clustering-deep-dive-ebook

JamesAspall
Enthusiast
Enthusiast
Jump to solution

Morning,

Support say this is not the case?

"Unfortunately Autostart will not work if the host was part of a HA cluster. If you enabled auto start on the host after it has powered on you will still need to restart the host again for the vCenter VM to power on. HA will not kick in as the other VM's are already powered off. The only way for the vCenter VM to be rebooted automatically along with the rest of the VM's would be to use VCHA which uses Active, Passive and Witness nodes. However you would need to have more then the 3 hosts within the cluster to ensure that VCHA would provide protection against 3 hosts going down. 

More information can be found in the following articles:

* FAQ: vCenter High Availability (2148003)
https://kb.vmware.com/s/article/2148003

* vCenter High Availability
https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.avail.doc/GUID-4A626993-A829-495C-9...
"

So it is not possible for auto recovery after a total cluster outage?

 

Cheers

James

0 Kudos
sjesse
Leadership
Leadership
Jump to solution

So that specific support person could be wrong, I've checked with other people and they agreed that they should restart because ha is on the host level not vCenter so any vm registered as powered on before should get turned back on. The only way you can be really sure is test this, if you have a lab that would be the best place. I may test this when I have some time to be sure myself, but I'm not ready to risk breaking my lab as I'd have to cut power to everything all at once.

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Yes this was my presumption of how it should work also, as it would seem daft for it to not work this way.

 

I will feed this back to them and have them consult other technicians to get some other opinions.

Appreciate your input on this, and absolutely don't risk your own lab on my account.


Thanks again, and watch this space for further feedback 🙂

 

Cheers

James

0 Kudos
DJKrafty
Enthusiast
Enthusiast
Jump to solution

What would you use for the "automation VM"? A Windows VM with Powercli and scripts set up to run at predetermined times (at boot?)

Speaking to the granular point, ideally, you'd write out a script that would power on the specific tiers of servers as needed right? i.e. Tier 1 = DCs,Email,etc  Tier 2 = File shares, Tier 3 = VDI and so on.

But in this case, you'd absolutely have to make sure storage is online and healthy after a dirty power off. Otherwise you're potentially compounding your issues, right?

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

Hi,

 

I've just had a call with the support person on the ticket I raised about this, and they still say that this will not function how we want in our specific scenario.

They've said that vCenter is what controls HA and it is not done at a host level, so if the entire cluster goes down, vCenter itself will not be running upon power recovery to restart VMs via HA. They also said that when HA is enabled, it disables autostart on the hosts, so we can't even use that as a workaround.

Are you sure you'd expect the VMs to auto recover after a total outage of the cluster? If not, might you have any other thoughts on how we can account for this? The only thing I can think of is some kind of cron job set to start the vCenter VM upon host startup if it's not already running?

Cheers

James

0 Kudos
DJKrafty
Enthusiast
Enthusiast
Jump to solution

What kind of hardware are you using? Do you by chance have power recovery options or a management API that accepts calls? You could have a startup procedure that is scripted out to start all the hosts, then boot the vCenter appliance, that would then boot all the VMs if you have HA configured properly.

0 Kudos
JamesAspall
Enthusiast
Enthusiast
Jump to solution

We only have;

2 APC UPS' (which we were planning to combine with PowerChute, but probably not now)

An HPE MSA SAN

3 HPE DL380 servers (G9 and G10)

 

That's pretty much the extent of the hardware available to us, so I would imagine I would have to set something up within ESXi on one specific host to get vCenter up and running at a minimum.

Any clue on how I'd go about creating a script that runs at start of ESXi on a host?

Thanks

James

0 Kudos
depping
Leadership
Leadership
Jump to solution


@JamesAspall wrote:

Hi,

 

I've just had a call with the support person on the ticket I raised about this, and they still say that this will not function how we want in our specific scenario.

They've said that vCenter is what controls HA and it is not done at a host level, so if the entire cluster goes down, vCenter itself will not be running upon power recovery to restart VMs via HA. They also said that when HA is enabled, it disables autostart on the hosts, so we can't even use that as a workaround.

Are you sure you'd expect the VMs to auto recover after a total outage of the cluster? If not, might you have any other thoughts on how we can account for this? The only thing I can think of is some kind of cron job set to start the vCenter VM upon host startup if it's not already running?

Cheers

James


That is absolute nonsense. I wrote a whole book on this topic. The book was reviewed and approved by VMware engineering. (https://www.rubrik.com/resources/white-papers/19/clustering-deep-dive-ebook)

HA has NEVER been dependent on vCenter to be available, not even in the very first release of HA. What is the SR? I am more than happy to email the person who provided this info to you.

0 Kudos