Hello, I was wondering if anyone has been faced with the task of shutting down an entire deployment of vSphere in an automated way before? Basically I have an instance of vCenter with a 2 node ESXi cluster. DRS and HA are enabled on this cluster. So it is designed to run 24/7 in the event of a host failure as well as balance resources. My issue is that I only have about 40 min of battery power in the instance of a prolonged power outage. I would like to be able to script the automatic shutdown of all VMs and both hosts. I've seen plenty of posts about scripting the process of entering maintenance mode and evacuating VMs, but this is a little different. I need to shut down all running VMs gracefully, then shut down each host. Preferably in a way that doesn't use PowerCLI as vCenter won't be available either. PowerCLI also requires there be a system up and running to run a PowerCLI session. In this case, there won't be.
Our UPS/storage deployment actually has a mechanism for reaching out to devices via SSH and running commands in the event of a low battery situation. I was thinking there might be a special combination of commands that I could run directly from a host SSH session?
For example here is the command I have used from SSH to put a host into maintenance mode:
esxcli system maintenanceMode set -e true -t 0
Here is a shut down command:
esxcli system shutdown poweroff -d 10 -r "Shell initiated system shutdown"
I know you can use the "vim-cmd vmsvc/power.on <vmid>" command to power on VMs. But you need to know the ID.
Seems like I have a few of the pieces, but I am not quite there yet.
Here's what you need to do to gracefully shutdown all VMs from ESXi SSH:
1. Make sure all your VMs have VMware Tools installed
2. Go to host configuration, select Virtual Machine Startup and Shutdown"
3. Tick the option "Allow virtual machines to start and stop automatically with the system"
4. Select "Guest Shutdown" under "Shutdown Action"
5. Save the configuration and exit
6. From now on, running the command below will shutdown guest OSes on all VMs running off the particular host and then shutdown the host itself
/sbin/shutdown.sh && / sbin / poweroff
I hope the above helps.
On a side note, I do agree with the previous poster - can you explain why you can't use PowerCLI in this scenario?
Hello, yes vCenter in my case is running as a VM within the same cluster I would like to shut down. So I guess it would be available for a "while", but this system will get shut down at some point, even if it is the last. There would need to be another VM running in order to actually run the PowerCLI shell too. So to me, it seemed like I would need to use a different mechanism.
pwilk, thanks for your reply. I am familiar with the "Virtual Machine Startup and Shutdown" interface. It seems to work really well with a single ESXi host, but doesn't the behavior change some when the hosts are in a cluster with HA and DRS enabled?
I was under the impression that this feature was actually disabled when HA is turned on? HA at that point handles the process of vMotion tasks to an alternate host and powers the VM on for you.
In the case of DRS, I am not sure that the power on and shut down settings would follow the VM when DRS moves VMs back and forth while trying to keep resource utilization even across hosts?
Finally, when running this command against a DRS/HA enabled cluster host, what would be the expected behavior? Would it not evacuate all VMs first?
/sbin/shutdown.sh && / sbin / poweroff
If that's the case, I would try to go through PowerCLI. My order of operations would be something like:
ESXi hosts really don't need to be gracefully shutdown in such an emergency situation because it runs entirely from memory and, at that point with no VMs, even if using local datastores you wouldn't have anything writing to those datastores. I have used this process in the past to automate exactly what you're doing and had good success with it, even in my home lab.
Thanks daphnissov, this all sounds easy enough to script, but it gets a little tricky.. Currently the UPS system is tied into our storage array. It is actually the storage array that reached out to each ESXi host in the cluster via SSH and runs "commands".. I can't see how from that "angle" I am going to kick off a PowerCLI script. Generally it's done with a scheduled task within a Windows VM.
I chose to connect the ups to our storage array because we wanted to make sure the array gracefully shut down too. I guess we could approach it from the other direction and connect the ups to one of the hosts (USB or Serial - not network enabled).. Then initiate the shutdown of the array via SSH. However, I have never had a lot of luck getting a serial/usb UPS to work directly with an ESXi host. Thoughts?
I know that you can power on and off VMs from the ESXi cli or SSH session. Can you initiate a guest (graceful shutdown) directly from the console too? That environment isn't nearly as easy for me to script in, so the process of getting a list of VM ids and running commands against them would be tough.
Yeah, I probably would approach that from the opposite direction. I'd connect the UPS via USB to a Windows "utility" or watchdog VM running on that cluster and use that to orchestrate all your shutdown activities. It also depends what hosts are consuming storage from that array. If it's only the VMs on that cluster, as long as the VMs are brought down, there should not be any data corruption concerns because there's no I/O issued to the backend. Even still, though, you'd probably want to bring the array down gracefully but as the last step. I don't know what type of array this is, but if it accepts SSH commands in privileged mode, this wouldn't be hard to issue from that Windows VM, and all of it could be done via PowerShell as well.
A bit of a thread resurrection, but...
I needed to do something similar recently. We were in a position where just leaving the hosts to die when the batteries went was not an option, because we were to shed supply load in a controlled manner. So we have a Windows based system management VM which does as much as it can, which is to shut down all VMs and hosts it can right up to VCSA, until it and its host (actually there are are of them in Windows Failover Clustered pair) are the last men standing.
At this point, a script on the System Management VM SCPs a shell script to its own host, and then SSHs into the host and sets this script running as a background task and then proceeds to shut itself down.
The ESXi shell script monitors the number of running VMs in a loop which counts down for 5 minutes. If the number of VMs reduces to 0 within the 5 minutes the loop exits immediately, otherwise it runs for the full 5 minutes. At the end of the loop, hopefully the Management VM has shut itself down, in any event the script instructs the host to poweroff, which will attempt to shut down VMs I believe, but will ultimately kill them and then power down the host.
Again all this was necessary because the hosts are clustered.