VMware Cloud Community
hstorn
Contributor
Contributor

Cluster HA and Virtual Machine Startup/Shutdown

Hello,

In the documentation, it is written :

"NOTE The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual

machines residing on hosts that are in (or moved into) a VMware HA cluster. VMware recommends that you

do not manually re-enable this setting for any of the virtual machines. Doing so could interfere with the actions

of cluster features such as VMware HA or Fault Tolerance."

When all ESX servers, how to set a startup order for virtual machines ? It's impossible ?

Thanks,

Regards.

Tags (2)
Reply
0 Kudos
29 Replies
ileidi
Contributor
Contributor

Hi mini,

I'm interesting of your post. Did you have tested your script?

I have 6 Virtual Machine that i cannot restart on another Esx Cluster Node (2 Nodes). So in Vmware HA for this Virtual Machine i disable Vmware Restart Priority and Host Isolation Resonse is Power Off. When i simulate a power failure, this Six Virtual Machine is not restart and i must start manually by Virtual Center.

I have no chance to made this operation automaticaly? Can i enable the automatic startup/shutdown only for this 6 Virtual Machine?

Andrea

Reply
0 Kudos
FredericNass
Contributor
Contributor

Hi Andrea,

Sorry. I ran out of time theses days. What you need to know is that if you turn off your VMs by yourself (with a clean shutdown from a script), you need to power them back on when host returns. Why ? Because HA has reason to think it should turn them back on host restart.

You need to record to a file which VM you turn off before your turn off the host, then when host restart, you use this file to restart all VMs you turned off before.

Mini.

Le 30 juin 2011 à 15:51, ileidi <communities-emailer@vmware.com<mailto:communities-emailer@vmware.com>> a écrit :

VMware Communities<http://communities.vmware.com/index.jspa>

Cluster HA and Virtual Machine Startup/Shutdown

reply from ileidi<http://communities.vmware.com/people/ileidi> in Availability: HA & FT - View the full discussion<http://communities.vmware.com/message/1782385#1782385

Reply
0 Kudos
FredericNass
Contributor
Contributor

Hi again,

Answering your questions...

Yes the script do work, except that you could expect some more delays in the VMs getting back up (dunno why for now) + last time it happened we had 2 VMs that stayed down while they should have be restarted. (I guess the UPS had not enough batteries to permit the clean shutdown of theses VMs.)

2 possibilities when power event occur :

- You shutdown the hosts with a shutdown command ignoring the state of any VMs. VMware will take care of shutdown down every VM even if it does "the bad way" (forced shutdown). Then when host restarts (and the vCenter I presume), HA should take care of restoring the previous state of each VM. That is it should start all VMs that were up when you shutdown the host. This is the "VMware way" to hadle the power failures from what I know.

- You shutdown all the VMs by yourself (recording which ones you'll have to restart later) then shutdown the host, then the UPS. On power return, you restore each VM that you previously shut down.

Whatever the way you choose, you could write a script (to be lauchend on host startup from /etc/rc.local), that would start thoses 6 VMs (regardless of the fact they are or not registered on that host). This way you'd be sure those 6 VMs would always be started when your hosts return.

But you'll have to remember that.

Hope that helps.

Mini.

Reply
0 Kudos
jr53
Contributor
Contributor

What is the final conclusion on the capability to sequence the start/restart of VMs in an HA environment – this can only be done by writing scripts?  I have two VMs that need to boot in a specific order, with a delay between the start of the first and the second.

I tried the following experiment with two VMs both running on the same ESX host in an HA cluster:

Set the VM restart priority of the first VM to high and the second VM to low.

Shutdown the ESX host with the two VMs.

The VMs restart on another ESX host but both VMs restart at about the same time (the same second in the vCenter events page).

The first VM is an MSSQL server and if the database is not available by the time the second VM boots, services on the second VM do not start up correctly.

Reply
0 Kudos
ChristophHerdeg
Contributor
Contributor

Sorry, Gleed, I also read your  Blog article, but I can only reject your opinion at the strongest,  imagine the following situation: We experience a complete power  blackout. All the APCs signal their connected hosts to shutdown  gracefully. Problem A) it's only possible to shutdown the VMs gracefully  by script. When the blackout is over, the hosts start again. But due to  the fact they were taken down gracefully. HA will do anything but  restart VMs.

Feature request (all Editions):

- Automatic VM Start/Stop has to become integrated with HA/DRS

- It must be possible to set an order in which VMs have to be restarted  by HA/DRS if there was a) a host failure or b) a graceful shutdown

- Every VM shall be restarted on the host -if still existant- it run last

- It must be possible to automatically stop VMs in case of a host's graceful shudown

[...]

Reply
0 Kudos
depping
Leadership
Leadership

vSphere doesn't know why the VMs were shut down to begin with as a 3rd party tool initiated the the shutdown. There could be a good reason why you shutdown everything. I think it would make sense for APC to include this in their tool, if power problem is solved allow you to "switch" everything back on again.

I can submit the feature request for you, but am guessing it will be difficult to integrate,

Reply
0 Kudos
ChristophHerdeg
Contributor
Contributor

...and vSphere doesn't need to know, imho. If an authenticated tool initiates a shutdown, a good reason should be assumed by default. That's what the authentication chain is good for.

Personally I'd like the APCs to be as dumb as possible (provide power for 15 minutes whan a blackout happens and signal the connected machines accordingly) and VMware to handle all the parts requiring some more intelligence, e.g. automatic and graceful VM starts / stops in HA/DRS situations. Really: vSphere is way expensive enough so that as a customer I must be allowed to exspect a clean solution for this obvious issue.

Yes: to submit a feature request would be great of you: thank you! Every corporation I know that is using vSphere is looking for this. Because they WANT to do it the VMware way instead of writing potentially harmful scripts.

Reply
0 Kudos
FredericNass
Contributor
Contributor

Hi,

I couldn't agree more with what Christoph said: vSphere should handle this. This requires sharp engineering, evolved and particular scripting skills and requires a great number of tests. vSphere should provide answer too this major problem.

From our experience, having to script the graceful shutdown of every VMs and hosts requires to permanently disable HA on every cluster because we don't want HA to interfere with the restarting of the VMs (that might not have succeeded to gracefuly shutdown), registering the same VMs twice or more on diffrent hosts...

This is how we do it here. It works quite well but it's a DIY solution and doesn't sound professional:

ups_event.sh: This script is called by Eaton IPP (with teamed DELL UPS) installed on vMA 5.0 when graceful shutdown needs to be done.

shutdown_esx.sh: This script is sent from vMA 5.0 to each host. It's then launched independently on each host. It gracefuly shutdowns each running VMs and programm each of theses running VMs to automatically start on the next host start.

host_return.sh: This script is launched every 3 minutes from /etc/cron.d on the vMA 5.0 VM. It checks wether a log file (/scratch/log/emergency_shutdown.log) exists on each host. If such a file exists, the script sends the host log report to sysadmin.local@yourdomain.com to notify you that this host was previously shutdown from shutdown_esx.sh script.

sendEmail: This Perl script is used to send emails from vMA 5.0

Some or some parts of these scripts where downloaded from the Internet and adapted to work in our vSphere 5 environment.

Please VMware, give us a durable solution to this so we don't have to reinvent the wheel each time a new vSphere version comes out.

Mini.

BTW : I posted here with the wrong account. I'm the "Minimouse57" posting up there.

Reply
0 Kudos
ChristophHerdeg
Contributor
Contributor

ManInTheMiddle,

Thanks alot for sharing your scripts - I'll definitely give them a sharp look and maybe I can use them at some of our customer's sites.

Regards,

Chris

Reply
0 Kudos
FredericNass
Contributor
Contributor

You're very welcome. I hope that will help.

BTW, timing in case of a power failure is as follows:

After power outage:


- 45 minutes after the power outage (running on batteries), Eaton IPPs running inside the vMA 5.0 VM triggers the graceful shutdown script ups_event.sh.
- The ups_event.sh script sends the shutdown_esx.sh script to each host via scp et execute it.
- 10 minutes after that, each host is down. (We're running a bit more than 10 VMs per host)
- 5 minutes later, UPS cuts the battery power and shuts down, waiting for the power to return.

After power return:

- When the power returns to the hosts, each host waits 3'30 minutes to let our Cisco switchs start up and get back to life, then start.
- 7 minutes after the power returned, each node is back online starting the VMs that where running at the time of the power outage.

It works fine and our vSphere Infrastructure is then able to face a power outage on its own without requiring any kind of admin intervention. That's serenity. Unfortunatly, we still might have this particular scenario : Shutdown order is given to the hosts but power returns before the UPS had the time to shutdown, so that the UPS will never cut the power off the hosts and the hosts will never restart.

I think VMware should address this particular scenario too. It's probably a much more expected fonctionallity than FT which undoubtly needed a lot more work to be done.

Mini.

PS : We had this particular scenario addressed before by using apcupsd daemon (shutting down UPS via serial cable) installed using rpm on ESX 4.0 hosts. But it couldn't work anymore with ESXi 5.0 hosts...

Reply
0 Kudos