VMware Cloud Community
Bruticusmaximus
Enthusiast
Enthusiast

Fault Tolerance slow network performance

I'm in the process of testing FT in a newly setup vSphere 6 environment.  What I'm noticing is that file copies from an NFS share across the network are MUCH slower

when FT is enabled.  With FT on, a 210MB file copy takes 50 seconds.  With FT off, it takes about 2 seconds.  This is a Red Hat 7.1 VM.  The NFS share is on another VM on a different host in the same cluster.  Has anybody else seen this?  I don't have a 5.5 environment to compare this to.

I haven't even tested fail over yet.  If this is the performance I can expect, I'm not sure it's worth it.

0 Kudos
13 Replies
bradley4681
Expert
Expert

Are you using a 10GB network for FT logging?

Cheers! If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Bruticusmaximus
Enthusiast
Enthusiast

Not currently but I will be in production.  I'm just not sure why that would be a factor.  Is it because the test file I'm copying has to be copied to 2 machines?  If that's the case, I would expect the copy to take twice as long not 25 times longer.

To add more details, the cluster I'm using has 3 hosts and there's 3 VMs running across these hosts. So, there's really no other network traffic going on.

0 Kudos
bradley4681
Expert
Expert

If you turn off FT and try the copy, do you still see the same performance?

Not using a 10GB FT network really affects performance in the new FT in 6.0. VMware changed the underlying FT technology and I ran into similar issues, check out this other post, FT CPU Spikes and Latency Its being copied to the Primary FT guest first and then being mirrored to the secondary FT guest. The copy times wouldn't necessarily be double because of  the lack of 10GB.

I was noticing network performance issues as well as the guest being slow with only 1 FT guest and no other guests on the cluster. As soon as I went to 10GB they all disappeared.

Cheers! If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
Bruticusmaximus
Enthusiast
Enthusiast

If I turn off FT, the file copy takes 2 seconds.  If I turn on FT, the file copy takes 50 seconds.  I can understand the impact of NOT using 10G for a busy VM. There would be a lot of stuff to copy to the second VM.  This is an idle VM.  There's nothing running on it.  It's just a generic Red Hat install.  I would think 1G would be enough to handle the copy of a 200MB file.

0 Kudos
bradley4681
Expert
Expert

I thought the same thing but it really does make the difference, the new technology under FT is really chatty, see that post I linked too, I was seeing slow network performance just doing pings.

VMware is suppose to be updating the docs to say its required.

I even bonded some NICs for 2gb and it still had slowness

Cheers! If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
blazilla
Enthusiast
Enthusiast

Hi everybody,

I watch exactly the same behavior. I'm running two HP ProLiant DL380 Gen8 with vSphere 6 and all I want is to protect two Windows 2012 VMs with FT. But as soon as I enable FT for a VM, ping latencies are going through the roof (between 5ms an 150ms). The same applies to file copies: 2 MB/s with FT, 120 MB/s without FT. If I test a Failover, pings are normal as long as the sec. VM isn't started. As soon as the VM is started, ping latencies are going through the roof again.

btw: I'm using 2x 10 GbE dedicated for FT (2x 10 GbE connected to one vSwitch, one VMK with FT-logging enabled). The links are NEVER saturated.

Best regards Patrick https://www.vcloudnine.de
0 Kudos
FritzBrause
Enthusiast
Enthusiast

I worked on this with VMware. Ping response times went up to 50-200 ms after FT was enabled.

High impact on the VMs. FT was not usable for VMs which needed real-time response times.

At the end, there was no solution as of today (in fact we worked on this around end of June).

VMware is aware of the issue and working on a fix to improve this.

For the moment, FT is not recommended for VMs which need high I/O.

BTW, we tried 1 GB and 10 GB NICs. No difference.

0 Kudos
adrianidsys
Contributor
Contributor

Hello,

We are seeing a similar issue for one of our customers.. Did you solve in any way the issue?

Many thanks..

0 Kudos
AndrewAbo
Contributor
Contributor

There is absolutely, beyond the shadow of a doubt, a significant difference in network performance when Fault Tolerance is enabled.

We are currently testing a FT File Server on which we are utilizing folder redirection (My Docs/Desktop/etc).  A few of us noticed some things running sluggishly after migrating to this FT file server over the previous non-FT file server.  There is minimal load on this file server, as only a test bed of 4 users are accessing this server via Folder Redirection (or anything else, for that matter).

For me, I noticed slowness with an application that is just an executable on my desktop.  Dragging the window around was severely sluggish.  For my co-worker, he noticed the search in the Windows Start Menu was very slow (typing in the name of an application vs. finding it in a list).

Since FT can be enabled/disabled on the fly, we disabled FT, and ...you guessed it...night and day difference.

This article gives a visualization of the difference in speed (showing also what my coworker was doing with typing in the name of an application in the start menu):

Visualizing the Impact of Folder Redirection – Start Menu Search • Helge Klein

Fault Tolerance is set up properly, and behaves exactly as it should (tested failovers, duplicated files in secondary datastore, etc.) with this lagginess exception.

As for our environment, it's the fastest I've ever had the pleasure in which to work:  100% SSD Fiber Channel SAN (Pure Storage), 20GB FlexFabric network connections with dual-vMotion configuration.

We are currently running vCenter 6.0 U2 with ESXi U2 (vCenter 6.0.0 b3634794, ESXi 6.0.0 b3620759), so the latest for both at the time of this publication.


Note the final sentence in this article (also copied here after the link).

Fault Tolerance Performance in vSphere 6 - VMware VROOM! Blog - VMware Blogs

Testing shows that vSphere FT can successfully protect a number of workloads like CPU-bound workloads, I/O-bound workloads, servers, and complex database workloads; however, admins should not use vSphere FT to protect highly latency-sensitive applications like voice-over-IP (VOIP) or high-frequency trading (HFT).

Although one would expect to take at least some hit by enabling FT, the difference we are seeing is leading us to seek other HA options.

0 Kudos
PemoK
Contributor
Contributor

hi guys,

I got the same issues and problems with vsphere 6 FT in my lab and production environment running all 10 GB NICs. Even tried 1 vCPU VMs with the same result.

you can see what we mean in this "how to" clip on youtube (watch the cmd box with the ping -t)

https://www.youtube.com/watch?v=SyfFckgI26I

opened up a case @ vmware but I think multi CPU FT is just not finished yet and should not be used in production.

0 Kudos
SkyChye
Contributor
Contributor

Hi,

I having same problem also, that is slowing down my production environment. Any solutions for this?

Screenshot_1.jpg

0 Kudos
PemoK
Contributor
Contributor

just a quick update on the 3 monthis ? old Vmware case:

- no real update. Vmware tries to gain time or forces us to close the case by stalling:

     - try this checkbox

     - try that checkbox

     - oh did you try this setting?

In my opinion FT is not an enterprise solution yet.... like the web client ...

0 Kudos
mintnj
Contributor
Contributor

I'm encountering same issue. With FT enabled for a 4cpu vm,  latency increases a lot in the range of >100ms . I am on cisco ucs so the vnics I have configured for FT are 10g and should really not even leave the chassis for the FT-log traffic. My upstream network is 10g as well though. I have vic1280's so bandwidth should not be an issue. Not really sure whats going on will dig a little deeper when I have some time, but it doesn't seem like FT is working correctly.

0 Kudos