Assuming you are using DRS and entering maintenance mode for the host your are taking down, the VM should be vmotioned to another host. If you are simulating a complete failure of that host, and the VM in question is using that host for compute, then that VM will be offline. It would seem you are mixing up compute node and the storage policy. In a single host failure scenario that VM's storage is still 100% available. But would need to be restarted on another host by you, scripts, and especially HA. In certain situations without HA you might need to remove the VM from inventory and re-register it (via datastore browser) to a live compute node. Cheers
Thanks Jonretting for clarifying.
Yes, I was completely powering of the VM's compute host, causing a VM offline situation.
But the HA scenario below is common to any cluster having a shared storage(other than vsan), ie VM will get restated on an available host in the cluster.
So what does this capability does additionally?
What does the Host failure mentioned in the storage policy means, Is it a disk failure or just a network partition?
Will a manual shutdown come under this?
Thanks in advance
Failures To Tolerate (FTT) is how many hosts can fail and still have data availability. The host can fail in any number of ways. Crash, purple screen of death. SSD failure, assuming only one diskgroup in a host. A network failure like you mentioned. If you have three nodes and 1 fails, you will be w/o redundancy until the node is brought back online. If you have four or more nodes, a rebuild will be started. I believe there is a timeout before the rebuild start to account for maintenance windows and reboots.
HA will power a machine up on another host in the event of a failure, if that machine was running. If the machine was powered off at the time of the failure it will show as disconnected until the host is back online. I hope this helps. Thank you, Zach.
The default amount of time before a rebuild takes place is still 60 minutes. On my lab setup I would occasionally forget to bring a host out of maintenance, or leave it off too long doing working on hardware.
The setting to change is "VSAN.ClomRepairDelay"
And to avoid restarting the host after modification you can manually restart the "clomd" daemon with:
%$ /etc/init.d/clomd restart
HA will power a machine up on another host in the event of a failure, if that machine was running. If the machine was powered off at the time of the failure it will show as disconnected until the host is back online.
FYI, Little correction is needed in above statement.
if the VM was powered off and host fails which was part of HA cluster, provided that powered off VM was part of shared datastore, it will still be re-registered by HA on one of the other healthy hosts in HA cluster. It's just that it will remain powered off.
Thanks a lot for the information
Thanks a lot