Hello,
I have a vSphere 5.5 HA cluster with three hosts and lots of VMs (win, linux). the power supply of this cluster is delivered by APC UPS with Network Management Card so this card car normally used with APC powerchute network services (PCNS) to properly shutdown the hosts and VMs in case of certains events in the UPS( powercut, batterie low...). reading the documentation i've seen that there are two ways to achieve this:
1. Download and install vMA, install PCNS 3.1 on it. and configure the automatic VMs shutdown with host shutdown when there is a non desirable event on the UPS. I'm hesitating on this solution cause i have read somewhere that automatic VMs shutdown is not recommended and/or desactivated in a HA cluster!!!!! Furthermore, in this case the VMs are shutdown automaticaly by the VM Tools but in the past i have faced lots of BSOD and kernel panic while these VMs were resetted by VM Tools when VM monitoring feature was activated in the Cluster.
2. Download and install vMA, install PCNS 3.1 on it configure only the hosts shutdown. And install and configure PCNS for each VMs. So that, in case of non desirable event in the UPS VMs are shutdown from the PCNS agent within them and the host is shutdown using PCNS on the vMA (OF course with a delay to allow the VMs to be shutdown properly first).
has anyone implemented this solution ?
What would you suggest ?
Hi,
If you do not wish to use VMWare tools to perform OS shut down command on the VMs then your only option is to install PCNS directly on each VM to be shut down or use a shutdown script triggered by PCNS that would perform a remote OS shutdown command on each of the VMs - that would require storing OS login credentials for the VMs to do the shut down.
Hi assanemd, Have you resolved this problem?
I'm in the same situation...
3 Esxi Host 5.5 and 1 APC, vCenter Server 5.5 and the PCNS 3.1 inside the cluster. 6 VM up and running in the cluster.
Any idea?
Thanks a lot
Bye
You can download and install the vMA and then install PCNS 3.1 on it or use the PCNS 3.1 Virtual Appliance.
Assuming you have a Single UPS powering the 3 ESXi hosts, choose Managed by vCenter option in the Setup Wizard and Single UPS Configuration option.
On the Virtual Machine settings page in the Setup Wizard you can enable VM shutdown/startup options and configure a delay. Since your Hosts are part of a HA cluster you don't set Automatic VM shutdown/startup using the vSphere Client to shut down the VMs with the host. PCNS will shut down the VMs on each Host prior to shutting down the Hosts themselves. VMware tools must be installed on each VM so that PCNS issue a graceful guest OS shut down command - otherwise they are powered off.
hello dgrehan
Thank you for your answer.
i don't want vmware to use vmware tools to shutdown VMs. because these are P2Ved VMs and in the past i faced lots of BSOD, Kernel panics while these VMs were resetted by VMware HA using vmware tools.
Hi,
If you do not wish to use VMWare tools to perform OS shut down command on the VMs then your only option is to install PCNS directly on each VM to be shut down or use a shutdown script triggered by PCNS that would perform a remote OS shutdown command on each of the VMs - that would require storing OS login credentials for the VMs to do the shut down.
"...that would require storing OS login credentials for the VMs to do the shut down..."
Not necessary. For this I created one more common account on all "slave" VMs, added "sudo /sbin/shutdown -h now" at the end of its .bash_profile file, and of course allowed it to do shutdown (visudo). "Master" VM then only connects to all slave-VMs (using small bash-script, ssh-client & keyfiles) and right after logging-in, shutdown is auto-started. No root-access creditentials, and no vmware-tools on slave-VMs are required...
I was thinking more of the Windows VMs (think there is a mix of Linux and Windows VMs in the configuration) and using "net rpc" command to do the shutdown from the vMA.
But now that you mention it, it would be possible to install Cygwin on the windows VMs and an SSH daemon and connect using an ssh-client and keyfiles as you suggest?
i have installed the pcns soft in all VMs. It's working very well. all the VMs have been shutdown.
i also use the vMA appliance to shutdown host. (i don't use the shutdown VM features) but i have the following behaviours.
I have 3 esxi. the vMA appliance is installed in the third esxi, (ESXi 3) HA is activated in the cluster , i noticed tha when we were testing all the VMs have been shutdown, also the ESXi 1 and 3 where the appliance is hosted. but the second ESXi (ESXi 2) wasn't shutdown. what can explain this ?
Another question: is it possible to tell to ESXi hosts to be shutdown only after all VMs have been shutdown ?
Hi,
You have 3 ESXi hosts in a HA Cluster. PCNS is installed on vMA running on one of the ESXi hosts (ESXi 3) - is vCenter Server running on ESXi 2? Are you using an Active Directory user account or a local account in PCNS?
Does the vCenter Server account configured in PCNS exist as a local user on each of the 3 ESXi hosts and have administrator permissions? Please refer to FA228172 on FAQ
i have a physical vcenter. i use local account in PCNS.
the vcenter account configured in PCNS is the vcenter SSO local default administrator@vsphere.local. it normally have permissions to shutdown ESXi cause it had succesfully shutdown the other ESXi and all the ESXi have the same config and belong to the same HA cluster.
The default administrator@vsphere.local account would not exist as a local user on the ESXi host and does not have admin permissions on the ESXi host i.e. that account cannot be used to connect directly to the ESXi host if vCenter Server is not accessible.
If for some reason PCNS cannot connect to the physical vCenter Server during the shutdown it will attempt to perform the host shutdown by connecting directly to the ESXi host using the vcenter account configured in PCNS. This will fail because administrator@vsphere.local cannot login to the ESXi host directly.
Could you provide a copy of the PCNS event log and /opt/APC/PowerChute/group1/error.log?
i will take tomorrow these logs. but is it possible that the HA config impacts the shutdown of host cause we noticed that all the VMs that were in ESXi1 were migrated to ESXi2 may be this was the cause of non-shutdown of the ESXi2 ?
Is DRS enabled and set to fully automated? If so then yes this could be the cause - PCNS issues a maintenance mode command at the start of the shutdown sequence and DRS will start moving VMs to other hosts if set to fully automated. Because the VMs are still running this prevents the Host from entering maintenance mode - the default timeout for maintenance mode is 0 i.e. it will wait indefinitely so PCNS does not shut down the host.
To avoid this you can add a key "Maintenance_Mode_Duration = 120" to the [HostSettings] section in /opt/APC/PowerChute/group1/pcnsconfig.ini (stop the service to edit the file and then re-start it). This forces the maintenance mode command to timeout after 120 seconds or whatever value you need to set.
Or you could just change the automation level for DRS to Partially automated - this will prevent DRS from automatically moving the VMs to other hosts in the cluster.
Hi,
There is nothing in the logs to indicate that Host 2 shutdown failed:
So you have a 15 minute delay for On Battery shutdown action - you mentioned that PCNS is also installed on each of the VMs - at what point do the VMs start shutting down? They should all be powered off before the ESXi hosts are commanded to shut down?
Could you replace the /opt/APC/PowerChute/group1/log4j.xml file with the one I've attached here and run the shut down? This will create debug output in error.log and a file called VMwareDebug.log.
Will try to re-create the issue in my setup. Just to confirm - Host 2 remains powered on and there are VMs powered on on Host2? In the tasks view for Host 2 is the Maintenance mode task still in progress?
05/21/2014 | 11:05:55 | UPS has switched to battery power. | .3.5.1.5.4.1 |
05/21/2014 | 11:20:55 | UPS critical event: <b>On Battery</b> occurred on Hosts: <b>host1, host2, host3</b>. | .3.4.9.9 |
05/21/2014 | 11:20:55 | Enter maintenance mode: <b>host1</b>. | .3.4.9.9 |
05/21/2014 | 11:20:55 | Enter maintenance mode: <b>host2</b>. | .3.4.9.9 |
05/21/2014 | 11:20:56 | Enter maintenance mode: <b>host3</b>. | .3.4.9.9 |
05/21/2014 | 11:20:56 | Exit maintenance mode: <b>host3</b>. | .3.4.9.9 |
05/21/2014 | 11:20:56 | Shutting down Host <b>host1</b>. | .3.4.9.9 |
05/21/2014 | 11:23:10 | Shutting down Host <b>host2</b>. | .3.4.9.9 |
05/21/2014 | 11:23:58 | Shutting down Host <b>host3</b>. | .3.4.9.9 |
Hello dgrehan,
that for your support on this point. Exactely i checked there was 2 VMs on this host that didn't have powerchute agent on them that's why the host wasn't shutdown.
No problem, glad to help.