Solved: Re: vSphere FT slow down the primary VM

akarydas2 · ‎06-01-2015

Hello

I am trying the new FT that come with vSphere 6.0 cause I really need the feature for a CPU intensive server that I can't cluster, but all my tries are useless as the new FT is actually slow it down no matter how many vCPU.

Tried so far from 1vCPU to 4 vCPU, same or different datastore for the secondary VM and have 2 dedicated 1GB cards in a DVS with LACP and the log doesn't exceed the 100Mbs.

Thanks

MKguy · ‎06-01-2015

I'm afraid LACP will be of no use for FT, because you have a fixed socket of source and destination IP (and MAC), and possibly a single layer 4 connection as well. In this case the hashing algorithm will always select a single link for all packets.

What's even more important than raw bandwidth would be latency though. The hosts aren't separated over longer distances, are they?

-- http://alpacapowered.wordpress.com

View solution in original post

CoolRam · ‎06-01-2015

This is really petheric . May be you cab try with different machine. I also tried but I never feel that.

If you find any answer useful. please mark the answer as correct or helpful.

adamwiso · ‎06-01-2015

I am pretty sure you need minimum 10GB to see solid performance on the multi SMP FT machines. We had a VMWare rep out a few weeks ago and he mentioned they recommend 40GB for multiple FT machines. I know its not the best answer - just what I have heard...

akarydas2 · ‎06-01-2015

According to the documentation the recommented network is a dedicated 10GB for FT logging.

But I am testing only with one VM with 1 up to 4 vCPU and I have 2 dedicated cards on a VDS with configured LACP and I see the traffic is not going higher than 90MBps. So practically there is available bandwidth.

If I use another VM with lower CPU utilization you can't see the difference. The network utilization for FT is from 3 to 9MBps and only when I initiated the automatic vmware tools upgrade I see higher to 70-80MBps.

I am afraid that if I was able to create some short of CPU utilization on that machine I will face same delays.

Thanks

MKguy · ‎06-01-2015

I'm afraid LACP will be of no use for FT, because you have a fixed socket of source and destination IP (and MAC), and possibly a single layer 4 connection as well. In this case the hashing algorithm will always select a single link for all packets.

What's even more important than raw bandwidth would be latency though. The hosts aren't separated over longer distances, are they?

-- http://alpacapowered.wordpress.com

Alistar · ‎06-02-2015

You say that the server you want to apply the FT to is CPU intensive. This also could mean that there is frequent access to RAM. Standard RAM speeds are in tens of gigabytes per seconds while having a latency of few nanosecond. Transferring the memory with ~100MB/s will not cut it. You will really need to get 10 GbE and test it out for yourself to see that the throughput matters. I also think that the traffic is not compressed in any way.

Good luck!

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/

bradley4681 · ‎06-02-2015

I just went through this whole ordeal with VMware and it's documented in this thread, Re: FT CPU Spikes and Latency

10GB is required for the new FT technology, fast check pointing. In your current setup if you create a 1 vCPU guest and put FT in legacy mode you'll see performance like in previous versions. However if you try with just 1 vCPU without using the legacy mode you will see performance issues. Review the thread I created and all of these are pretty much laid out.

Cheers! If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".

akarydas2 · ‎06-02-2015

I am in process to order a new 10GB adapter for my ESXi servers to test it but again this "recommendation" in all the documentation is confusing and misleading.

Once I repeat the tests with 10GB I will get back.

ArendADX · ‎06-12-2015

Do you already have results?

I'm missing some configuration options. For example if you want to FA like in legacy mode, but with two vCPU, a 1GB connection should be sufficiant. How to disable datastore replication?

akarydas2 · ‎06-12-2015

As far as I know the legacy FT does not support more than one vCPU.

And the new one which only for using the backup option which now is available for FT machines, slow down the primary VM at least when using the 1GB FT link.

So I am waiting my new 10GB adapters to do a smoke test and confirm if the 10G FT link is mandatory for the new FT to work.

Once I have them I will update you with the results.

Thanks

ArendADX · ‎06-15-2015

I'm waiting for your response.

All search on the internet point to the 10GB NIC being the solution. 1GB NIC has sufficiant throughput, but the latency is to long.

I also see that all VM's on the host (also the VM's which are not fault tolerant) slow down very much, when ft-logging is through 1GB NIC. (initiating the FT then almost stops all other VM's on host)

akarydas2 · ‎06-15-2015

I am sorry for delay but still waiting my new 10GB cards to compare results and confirm the new FT.

I hope to have them by end of the week.

akarydas2 · ‎06-17-2015

Just received my 10GB cards, installed them and now I see in real what the SMP-FT can do without delays but just raw power. But the log traffic can easily reach the 150Mbps and peak so far the 250Mbps.

So that excuse the need of the 10GB card which I think VMWARE need to have as mandatory for the deployment of their new FT.

Thanks

ArendADX · ‎06-18-2015

Thanks for confirming our presumption that 10 GB NICs in the hosts in the cluster are mandatory.

I can understand this if FT on multiple CPU and datastore replication is needed.

I'm sorry that the new FT technology does not let you choose what to FT. For example if datastore replication is not needed, there should be an option to disable that. I asume then less network throughput/latency is necessary.

I like the legacy FT, but then for more vCPU.....

vandewt · ‎08-14-2015

We are running into the same problem trying to get FT working at acceptable performance levels using a pair of 1Gb NICs for the FT logging traffic. Thank you for confirming what we were already suspecting about 10GB NICs. One question though, did you connect your hosts over 10GB directly connecting between the host NICs or using a switch? I'd like to cut the costs down in getting FT implemented and NIC to NIC would cut out the price of a 10GB switch.

akarydas2 · ‎08-14-2015

I was having a 2 node cluster so it was possible for me to connect them back to back with fiber 10GB cards and worked.

Performance is also good through the switches (I used for testing one of the 10GB uplinks of the switches).

For larger deployments of course 10GB switches will become a must....

Best Regards