Solved: VM failover among hosts without downtime?

henleu · ‎09-01-2017

We target to use 4 to 5 VM hosts to run vSphere, with a shared SAN storage through fibre channel, and 1 separate server to run vCenter.

With this setup, is it possible that when a VM host is down, all its VMs will auto-failover to other hosts without downtime (not even guest OS reboot)? If yes, what edition of vSphere and vCenter do we need? And what is the technology behind it (e.g. VMware HA? Storage vMotion? etc.)

I am not very familiar with VMware products and features. Please forgive if any concept or terminology is wrong.

If it's not achievable in VMware, how about HyperV?

a_p_ · ‎09-01-2017

Welcome to the Community,

The feature you are looking for is "Fault Tolerance". Please see e.g. Fault Tolerance Requirements, Limits, and Licensing whether this meets your needs.

André

View solution in original post

a_p_ · ‎09-01-2017

Welcome to the Community,

The feature you are looking for is "Fault Tolerance". Please see e.g. Fault Tolerance Requirements, Limits, and Licensing whether this meets your needs.

André

henleu · ‎09-01-2017

Thanks. From your link, FT works by running 2 VMs together and continuously replicate changes from primary VM to secondary VM. Is this necessary even when our VMs all run on the same shared SAN storage? I thought that since all hosts can access this shared storage, a healthy host should be easy to pick up what's left off by the downed host, so one single VM "instance" should be enough?

a_p_ · ‎09-01-2017

To be honest, I did not really understand why VMware changed the FT behavior.

In the first versions, it actually worked with this way, i.e. no replicated disks, only the the VMs state was "replicated".

André

MattMeyer · ‎09-01-2017

Actually, the entire technology for this feature changed, not just the behavior. There were a few reasons for this.

1. The SMP VM was always the top request for customers looking to use this feature. The old technology (record/replay) was not able to monitor multiple threads and replay them on the secondary machine without significant performance issues. One thread is easy since there is nothing to keep in sync. With SMP, threads need to be replayed back in the exact same order on the secondary host as the primary. Now, you are not just replaying the execution of the thread since 1 vCPU is really FIFO, but with SMP not so much.

So now FT acts more like an xVmotion (vMotion + Storage vMotion) that never completes. The vMotion part will monitor memory bitmap. As memory changes, the changes are sent to the secondary VM world running on the second host. The difference from a real vMotion is there are many, many, many more checkpoints involved. A checkpoint is a window of time that is checked for changes to the memory bitmap. A regular vMotion allows for the check point to be really big at first, then as you get towards the old of the vMotion, more checkpoints are added to move less data between the each checkpoint. Once it's determined the rest of the dirty memory must be able to be transferred under 1 second. If not, it continues to try again until there is a moment it can do it. FT does this much the same way but on steroids. We are talking sub ms checkpoints here because we don't have the luxury of a 1-second window like a vMotion can tolerate. There are other several low-level geek stuff that differs from a regular vMotion too, but that's the general way memory is not synced.

Next is way CPU is handled. With Legacy FT, aka RR FT (record/replay FT), everything was actually played back exactly like the original. With the new generation of FT the thread is executed on the primary host, but only the result is sent to the secondary. This is also how vMotion does things. When you think about it, it's not entirely necessary for the secondary VM to need to replay everything as long as it has the result from the last thing that was processed on the primary. If there is a failure, the secondary will pick up where the primary left off.

Finally, storage. Since the VM is not replaying everything that happened on the primary VM, the secondary doesn't need to read the disks for any IO that is needed. So we do this different now too. Similar to a Storage vMotion, FT will send any write IOs to the secondary VM to be committed there before committed on the primary. Read IO really doesn't make a big difference since the secondary VM doesn't need to read anything again. There are benefits to this now because you can store a secondary copy of the data on a completely different set of disks, or even array for increased redundancy. The drawback is double the required storage.

Some of the other benefits of the new technology used is the hosts don't need to have identical CPUs. With RR FT the CPUs needed to pretty much match since the entire thread was executed again. If there are instruction sets used on one CPU that are not available to the other, a whole lot of bad things happen. Since it's more like a vMotion, as long as the CPUs can participate in the same EVC mode, they can be used for FT too.

2. The other reason it was time to move on was the evolution of the silicon. There are certain features of the CPU that were not guaranteed to stick around in future generations. We needed a way to guarantee the feature without the being tied to certain features of the CPU, so the new FT was developed.

Hope this helps explain how it all works, and why things changed.

a_p_ · ‎09-01-2017

Thanks a lot MattMeyer for explaining the differences, and the insight. It is most appreciated.

I understand that replicating the storage has benefits, but as you said "The drawback is double the required storage".

What I actually didn't understand, is why the storage replication is required, i.e. not optional? Does this have technical reasons (e.g. VMFS locking etc.), or was it a management/design decision to do it this way?

André

henleu · ‎09-12-2017

Just wonder, is there a similar feature in HyperV? i.e. VM auto failover among hosts without downtime

All

VM failover among hosts without downtime?