VMware Cloud Community
Knightstorm
Contributor
Contributor

How fast is a fault tolerant failover?

We are looking to setting up a fault tolerant solution for software that monitors CNC machining equipment.  The proprietary software uses TCP/IP, serial over TCP/IP and an external SQL server database to monitor and report on the machining process.    I am concerned that the failover delay, while not long enough to bother a human being, may cause some timeout issues when communicating with the manufacturing equipment.  If an unplanned failover will cause timing issues, fault tolerance will not be a reliable solution.

Reply
0 Kudos
5 Replies
vmroyale
Immortal
Immortal

Hello and welcome to the forums.

Note: This discussion was moved from the VMware ESXi 4 community to the HA & FT community.

It is fast, but the problem is that the term "fast" is too general to be of any use.  The absolute best way to determine if FT fits the use case here is to test it and see. The "The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines" document gets very detailed in how FT works, and is definitely worth looking at, if you want to understand how FT works.

Good Luck!

Brian Atkinson | vExpert | VMTN Moderator | Author of "VCP5-DCV VMware Certified Professional-Data Center Virtualization on vSphere 5.5 Study Guide: VCP-550" | @vmroyale | http://vmroyale.com
Reply
0 Kudos
Knightstorm
Contributor
Contributor

Unfortunately, I am not in a postion to test since the hardware and software is not in place yet.  If the "best guess" is that there will be timeout issues, than fault tolerance is not a good option and we will be better off using high availablilty and shared storage for the VMs.

Reply
0 Kudos
Troy_Clavell
Immortal
Immortal

you can expect to miss a single ping, just as if you were to vMotion a guest.  So, yes, pretty fast.  But, again, testing is key.

Reply
0 Kudos
idle-jam
Immortal
Immortal

in my case it is very fast of 1-2 of ping failures only ..

Reply
0 Kudos
mcowger
Immortal
Immortal

If it happens over TCP, retries that TCP does will handle any timing issues by the <1 second failover time.

--Matt VCDX #52 blog.cowger.us
Reply
0 Kudos