VM failure in FT scenario

zeebahi · ‎02-15-2020

Hi everyone,

Let say we have a cluster with FT enabled for VM1 . If either VM1( primary) or VM1( Secondary) goes down, a new VM1 ( primary) or VM1( secondary) will be turned on.

Should we say HA has turned on the these new VMS ( VM1 Primary , VM1 Secondary) or should we say FT has turned on these new VMS?

Thanks and have a good weekend!!

Alex_Romeo · ‎02-15-2020

HI,

It's a good question, it depends a lot on the type of configuration.

Best Practices for Fault Tolerance

In a partitioned vSphere HA cluster using Fault Tolerance, the Primary VM (or its Secondary VM) could end up in a partition managed by a master host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the master host responsible for it.

---------------------------

High Availability vs. Fault Tolerance vs. Disaster Recovery | Green House Data

HA

In VMware, HA works by creating a pool of virtual machines and associated resources within a cluster. When a given host or virtual machine fails, it is restarted on another VM within the cluster.

For physical infrastructure, HA is achieved by designing the system with no single point of failure; in other words, redundant components are required for all critical power, cooling, compute, network, and storage infrastructure.

One example of a simple HA strategy is hosting two identical web servers with a load balancer splitting traffic between them and an additional load balancer on standby. If one server goes down, the balancer can direct traffic to the second server (as long as it is configured with enough resources to handle the additional traffic). If one load balancer goes down, the second can spin up.

The load balancer in this situation is key. HA only works if you have systems in place to detect failures and redirect workloads, whether at the server level or the physical component level. Otherwise you may have resiliency and redundancy in place but no true HA strategy.

Fault Tolerance

A Fault Tolerant system is extremely similar to HA, but goes one step further by guaranteeing zero downtime. HA still comes with a small portion of downtime, hence the ideal of a perfect HA strategy reaching “five nines” rather than 100% uptime. The time it takes for the intermediary layer, like the load balancer or hypervisor, to detect a problem and restart the VM can add up to minutes or even hours over the course of yearly runtime.

Within VMware, FT ensures availability by keeping VM copies on a separate host machine. With only HA configured, the hypervisor attempts to restart the VM on the same host cluster. If the physical infrastructure powering that host is having problems, HA may not work.

They are two features that work in synergy when they are enabled within a Cluster, which respond to different peculiarities while a machine downtime is being created. They do not conflict with each other, so HA or FT could intervene depending on how the fault occurred.

ARomeo

Blog: https://www.aleadmin.it/

All

VM failure in FT scenario