Do you do any formal testing of your datacentre vmware HA clusters to ensure they actually work in the case of a host failure? What kind of testing do you do, and how often? I.e. pull the power and see if another host restarts the VM's isnt probably all that practical in most organisations - so how can you gain assurance your cluster is fit for purpose, what kind of disaster reherseal excercises do you do?
I normally test this once when the new HA/DRS clusters are provisioned, and it seems to test it's self every now and again when you get a PSOD or other type of hardware failure.
If you have an iLO/iDRAC remote management interface then you could reset the power for a controlled test, but perhaps a cleaner option would be to change the host isolation response to shutdown or restart the VM and then drop the management interface of the host that you are testing.
My assumption is that if the vSphere HA state is conencted (seen on the summary tab of the host) or there are no configuration issues (seen on the summary tab of the cluster), then I trust that it is working.
Cheers,
Jon
I normally test this once when the new HA/DRS clusters are provisioned, and it seems to test it's self every now and again when you get a PSOD or other type of hardware failure.
If you have an iLO/iDRAC remote management interface then you could reset the power for a controlled test, but perhaps a cleaner option would be to change the host isolation response to shutdown or restart the VM and then drop the management interface of the host that you are testing.
My assumption is that if the vSphere HA state is conencted (seen on the summary tab of the host) or there are no configuration issues (seen on the summary tab of the cluster), then I trust that it is working.
Cheers,
Jon
I guess its a similar concept really to testing backups of say MSSQL databases - you want some degree of assurance they are fit for purpose if called upon. I appreciate your feedback and response though.
