VMware Fault Tolerance Requirements and Limitations

VMware Fault Tolerance Requirements and Limitations

This blog entry continues to get a lot of hits, so I thought I would keep it updated and reformat it a bit. VMware's Fault Tolerance is a great feature that has generated a lot of interest, and it is also a new feature of vSphere that will only continue to improve. With that being said, the list below is the current state of requirements and limitations for enabling FT virtual machines in vSphere.  The majority of this information came from the vSphere Pre-requisites Checklist, the VMware Fault Tolerance Datasheet and the Availability Guide. Other items were picked up in the forums or in the VMware knowledge base. kb article 1010601 "Understanding VMware Fault Tolerance" is a great kb resource to start with, if you are new to this feature.  kb 1022844 contains the changes to Fault Tolerance in vSphere 4.1.

Last updated: October 02, 2012

INFRASTRUCTURE:

VMware FT is available in the following versions of vSphere: Enterprise, Enterprise Plus
Note: vSphere Advanced Edition is no longer available in vSphere 5.

A host must be certified by the OEM as FT-capable. Refer to the current Hardware Compatibility List (HCL) for a list of FT-supported servers.

Ensure that HV (Hardware Virtualization) is enabled in the BIOS.

Ensure that FT protected virtual machines are on shared storage (FC, iSCSI or NFS).

Ensure that the primary and secondary ESX hosts and virtual machines are in an HA-enabled cluster.

Ensure that there is no requirement to use DRS for VMware FT protected virtual machines; in this release VMware FT cannot be used with VMware DRS (although manual VMotion is allowed). - In vSphere 4.1, FT is integrated with DRS, which means that DRS can now load balance both the primary and secondary Fault Tolerant virtual machines.

Ensure that the primary and secondary ESX/ESXi hosts are running the same build of VMware ESX/ESXi.  Note: kb 1013637 published September 25, 2009 states that "When creating a cluster that will have fault tolerant virtual machines, the cluster should consist of all ESX hosts or all ESXi hosts and not a mix of ESX and ESXi hosts." - In vSphere 4.1, FT has a version associated with it, which means that the primary and secondary ESX do not need to be run at the same build number/patch level.

When you upgrade hosts that contain fault tolerant virtual machines, ensure that the Primary and Secondary VMs continue to run on hosts with the same ESX/ESXi version and patch level. In vSphere 4.1, FT has a version associated with it, which means that the primary and secondary ESX do not need to be run at the same build number/patch level.

Ensure that there are will be no more than four VMware FT enabled virtual machine primaries or secondaries on any single ESX/ESXi host (from the configuration maximums document.)

Ensure that at least gigabit NICs are used. (10 Gbit NICs can be used as well as jumbo frames enabled for better performance.) Each host must have a VMotion and a Fault Tolerance Logging NIC configured. The VMotion and FT logging NICs must be on different subnets.

Ensure that host certificate checking is enabled (enabled by default) before you add the ESX/ESXi host to vCenter Server.

Ensure that IPv4 is used for the Fault Tolerance logging network.  (HA does support IPv6 for management networks.)

Ensure that there is no user requirement to use NPT/EPT (Nested Page Tables/Extended Page Tables) since VMware FT disables NPT/EPT on the ESX host.

VMware Fault Tolerance requires a dedicated Gigabit Ethernet network between the physical servers, 10 Gigabit Ethernet should be considered if VMware FT is enabled for many virtual machines on the same host.

There are no limits on how many virtual machines in a VMware DRS or VMware HA cluster can be enabled for VMware FT, but every machine with VMware FT enabled takes up twice as much capacity; this should be built into the configuration.

Overhead is dependent on the workload and can be as low as 5% or as much as 20%.

If firewalls or other controls exist between ESX hosts, ports 8100, 8200 (Outgoing TCP, incoming and outgoing UDP) must be open.

Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. Fault tolerant virtual machines use their full memory reservation. Without this excess in the resource pool, there might not be any memory available to use as overhead memory.

To ensure redundancy and maximum Fault Tolerance protection, VMware recommends that you have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.

Too Much Activity on VMFS Volume Can Lead to Virtual Machine Failovers - reduce the number of file system operations or ensure that the fault tolerant virtual machine is on a VMFS volume that does not have an abundance of other virtual machines that are regularly being powered on, powered off, or migrated using VMotion.

When Fault Tolerance is turned on, vCenter Server unsets the virtual machine's memory limit and sets the memory reservation to the memory size of the virtual machine. While Fault Tolerance remains turned on, you cannot change the memory reservation, size, limit, or shares.

Disabling the virtual machine restart priority setting for a fault tolerant virtual machine causes the Turn Off Fault Tolerance operation to fail. In addition, fault tolerant virtual machines with the virtual machine restart priority setting disabled cannot be deleted.

FT requires that the hosts for the Primary and Secondary VMs use the same CPU model, family, and stepping.

Hosts running the Primary and Secondary VMs should operate at approximately the same processor frequencies, otherwise the Secondary VM might be restarted more frequently. Platform power management features which do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly.

You cannot back up an FT-enabled virtual machine using VCB, vStorage API for Data Protection, VMware Data Recovery or similar backup products that require the use of a virtual machine snapshot, as performed by ESX/ESXi. To back up a fault tolerant virtual machine in this manner, you must first disable FT, then re-enable FT after performing the backup. Storage array-based snapshots do not affect FT.

Apply the same instruction set extension configuration (enabled or disabled) to all hosts.

Ensure that the processors are supported. (Download VMware SiteSurvey.) For VMware FT to be supported, the servers that host the virtual machines must each use a supported processor from the same category as documented below.

Intel Xeon based on 45nm Core 2 Microarchitecture Category:

31xx Series

33xx Series

52xx Series (DP)

54xx Series

74xx Series

Intel Xeon based on Core i7 Microarchitecture Category

Nehalem Series Group (any processor series here can be used):

34xx Series (Lynnfield)

35xx Series

55xx Series

65xx Series

75xx Series

Westmere Series Groups (each processor series must be used separately):

34xx Series (Clarkdale)

i3/i5 (Clarkdale)

36xx Series

56xx Series

AMD 3rd Generation Opteron Category

13xx and 14xx Series

23xx and 24xx Series (DP)

41xx Series

61xx Series

83xx and 84xx Series (MP)

View full details about processor and other requirements in kb 1008027

-


VIRTUAL MACHINES:

Virtual machines must be running on one of the supported guest operating systems. See VMware kb 1008027 for more information.

Mac OS X Server 10.6 is not supported.

The combination of the virtual machine's guest operating system and processor must be supported by Fault Tolerance (for example, 32-bit Solaris on AMD-based processors is not currently supported).

VMware FT requires virtual machines to have thick-eager zeroed disks.  Thin or sparsely allocated disks will be converted to thick-eager zeroed when VMware FT is enabled requiring additional storage space. The virtual machine must be in a powered-off state to take this action.

Ensure that the datastore is not using physical RDM (Raw Disk Mapping). Virtual RDM is supported.

VMware recommends that you use a maximum of 16 virtual disks per fault tolerant virtual machine.

The virtual machine cannot have more than 64GB of RAM.

Ensure that there is no requirement to use Storage VMotion for VMware FT VMs, since Storage VMotion is not supported for VMware FT VMs.

Ensure that NPIV (N-Port ID Virtualization) is not used, since NPIV is not supported with VMware FT.

Ensure that the virtual machines are NOT using more than 1 vCPU. (SMP is not supported.)

Ensure that there is no user requirement to hot add or remove devices since hot plugging devices cannot be done with VMware FT.

Ensure that USB Passthrough is not used.

Ensure that there is no user requirement to use USB (USB must be disabled) and sound devices (must not be configured) since these are not supported for Record/Replay (and VMware FT.)

Ensure that there is no user requirement to have virtual machine snapshots since these are not supported for VMware FT. Delete snapshots from existing virtual machines before protecting with VMware FT. Note: Client agents may be required for backups.

Ensure that virtual machine hardware is upgraded to v7.

Ensure that the virtual machines do not use a paravirtualized guest OS. Note: On September 22, 2009 it was announced that support for guest OS paravirtualization using VMware VMI to be retired from new products in 2010-201....

Fault Tolerance is not supported with Paravirtual SCSI adapters.

The vmxnet3 adapter is not supported with Fault Tolerance.  See kb 1013757In vSphere 4.1, you can use vmxnet3 vNICs in FT-enabled virtual machines.

Some legacy network drivers are not supported. vmxnet2 is, but you might need to install VMware tools to access the vmxnet2 driver instead of vlance in certain guest operating systems.

Ensure MSCS clustered virtual machines will have MSCS clustering removed prior to protecting with VMware FT.

VMs can’t have any non-replayable devices (USB, sounds, physical CD-ROM, physical floppy)

The virtual machine must not be a template or linked clone.

The virtual machine must not have VMware HA disabled.

VMDirectPath is not available for FT virtual machines.

VMCI stream socket connections are dropped when a virtual machine is put into Fault Tolerance (FT) mode. No new VMCI stream socket connections can be established while in FT mode.

The hot plug device feature is automatically disabled for fault tolerant virtual machines. To hot plug devices, you must momentarily turn off Fault Tolerance, perform the hot plug, and then turn on Fault Tolerance.

Extended Page Tables (EPT)/Rapid Virtualization Indexing (RVI) is automatically disabled for virtual machines with Fault Tolerance turned on.

Software virtualization with FT is unsupported.

FT virtual machines cannot be replicated with the vSphere Replication feature in SRM 5.

Dynamic Disk Mirroring use in the guest OS is not supported.

OTHER:

In the situation where virtual machines are configured with Fault Tolerance, AppSpeed might not monitor these virtual machines fully in the current GA version. In some cases AppSpeed generates empty monitoring data caused by the passive virtual machine in the Fault Tolerance constellation. - kb 1013896

Fault Tolerant virtual machines that have a change tracking resource (CTK) listed in the virtual machine configuration will rapidly switch between ESX hosts when being powered on.  CTK must be disabled, or the CTK variables must be removed from the virtual machine configuration (.vmx) file. - kb 1013400

An Absolute Must Read: The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines

If you know of any others, feel free to share.  As always, thanks for reading.

Comments

*Ensure that the virtual machines are NOT using more than 1 vCPU. (SMP is not supported.)

not good point .. it's hard to implement this FT for critical Machine (VM) .. especially at our environment min vCPU for our critical machine are 2 ..

In adittion to PVSCSI devices VMware FT cannot be enabled on a virtual machine using VMXNET3

Note: This has been changed in vSphere 4.1

Good study case published at <a class="jive-link-external" href="http://searchnetworking.techtarget.com.au/articles/35270-Case-study-Forget-1-GE-on-VMware-unless-you-can-do-without-vMotion-high-availability-and-fault-tolerance">http://searchnetworking.techtarget.com.au/articles/35270-Case-study-Forget-1-GE-on-VMware-unless-you-can-do-without-vMotion-high-availability-and-fault-tolerance</a>



"An attempted direct connection of NIC to VM eventually delivered close to 10GE performance, but with the unfortunate side effect of being possible only under VMDirectPath, a product that is not currently compatible with vMotion or VMware tools that offer high availability and fault tolerance."



Checklist for would-be users of 10GE under VMware:

  • Check which PCI slot your NIc uses and make sure it can deliver the speed you need<br />

  • Turn on VT-X, NUMA, SMT and VTD in your server BIOS<br />

  • Use the vmxnet3 driver (or its successors) and not the e1000 driver"<br />


NOTE - If VMware recommends vmxnet3 driver for 10GE Fault Tolerance will not work.

http://virtualcloud.wordpress.com

Hi,

You know say to me if is supported use FT when virtual machine is allocated in internal disk ?

"Ensure that FT protected virtual machines are on shared storage (FC, iSCSI or NFS)." - Internal storage is not supported.

Just what I was looking for and quite thorough as well. Thanks for posting this, I saw a couple other similar posts but yours was the best so far. I hope it stays updated, take care.

Ruby

I wonder if any of the shortcomings - in particular, the inability to use VCB to backup FT-enabled VMs and the inability to enable storage vmotion - can be / will be addressed in future versions?

I have a problem. Vmotion is working fine, but I cannot install FT, becasue my CPU not support it. I have Xeon 3440, and it is as supported CPU. Curently I have installed 2 ESX4.1 on the VM workstation 8. Is it problem with this scenario?

Without iSCSI or NFS, can we install FT or not?

You will need shared storage, and it can be FC, iSCSI or NFS.

Can i restart or shutdown a VM that is protected with Fault Tolerance and the secondary will take the lead and not restarted or shutdown

Thanks

No you can't. The machines are in vLockstep, which means that the secondary would also shut down.

Version history
Revision #:
1 of 1
Last update:
‎05-18-2009 06:00 PM
Updated by: