We have 2 vsphere clusters and are migrating VM's from the v5.5 to v6.5. On the exact same hardware (2-socket xeon E5-2689 v4, 2 x (10 pCores / 20 thread) = 40 threads), we have a huge performance drop between 5.5 and 6.5 while migrating a large VM.
The host are in Ivy-bridge compatibility mode on the v5.5 cluster, and in Haswell compatibility mode on the v6.5 cluster.
The VM impacted is a 20 vcore SQL Server 2k17 (so mission critical). This VM is 4 vSocket and 5 cores per socket (for SQL Standard licensing reasons)
The metric we are seeing is a high response time on the guest SQL, and a high CPU ready ratio on the v6.5 cluster (10 to 20% compared to 0.5% when on the old v5.5 cluster).
One interesting thing : With the CPU-Z benchmark, we are seeing a big difference between v5.5 et v6.5
- on the v5.5 cluster, on a test 20 vcore test VM (on an empty host, just this VM), we can have a "Multi Thread Ratio" of 19.33 for 20 vcore, so basically, the v5.5 CPU scheduler uses all the pCores and not the HT cores of both sockets (since I assume an HT core is around 30 or 40% performance increase not more.).
- on the v6.5 cluster, same hardware, same empty host but the test VM, we cannot go past a 15.5 "Multi-thread ratio". Which we may interpret as : "the v6.5 Cpu Scheduler uses 10 cores at 100%, then 10 other cores at 55% of the raw power). Since these 55% are more than what HT-cores can give, we are suspecting there's something related to vSphere 6.5 which cause this big performance drop.
These test were made on hosts without other VM.
We have tried lot of combinations while doing our tests.
- enabling or disabling "Cpu Hot Add" in order to switch to legacy UMA instead on vNuma
- enabling or disabling the numa.vcpu.preferHT setting
- changing the numa.vcpu.maxPerVirtualNode to 10, then 20
- tried 2 socket x 10 vcores, 1 socket x 20 vcores, 20sockets x 1 vcore
- tried changing the power management setting, or the latency
What is really strange, is until 10 core, the VM have a "multi thread ratio" near what we expect (10 core = 9.62 ratio ), but beyond 10 cores, things degrades (12 cores = 8.75 and raw performance is lesser than with 10 cores...). While migrating, we have noticed some performance gains on small VM (up to 8 vCores), but for our 20 cores one it's a fail...
Except the CPU compatibility model (ivy-bridge / haswell), the only difference we see between 5.5 and 6.5 in vmlogs is the spectre cpuid on v6.5 (cpuid.IBRS, cpuid.IBPB, cpuid.STIBP). I don't find any of theses lines in the 5.5 logs. But since public benchmarks doesn't report 30% performance drop on spectre/meltdown patchs, we think this may be unrelated.
Last detail : our v6.5 is patched for L1TF (Sequential-context attack vector) since we are seeing the "
esx.problem.hyperthreading.unmitigated" warning on our hosts. However, we didn't activate the VMkernel.Boot.hyperthreadingMitigation setting. (Concurrent-context attack vector patch).
We are out of idea, and would really like to get the same performances on v6.5 than we have on v5.5. (since it's the same hardware)
If someone could shed some light on this topic, we would be infinitely grateful !
Thanks in advance !
In Case of SQL i would like to Ensure following
1. No CPU Overcommitment on ESXi Host
2. Use pvScsi and vmxnet3 adapters
3. Ensure ESXi Power Management is set to High Performance.
4. Run some Performance analysis on storage where SQL VM resides. ESXTOP may help you to understand. Should not be a latency of Storage array or Storage adapters on ESXi hosts.
5. VMWare Tool should be up to date.
I am sure you might have done it all by now but these are some of my initial assessment i would prefer to do for any SQL VM.
Two things comes to mind...
- You are over-provisioning
- Might see a performance drop due to 6.5 including meltdown/specter patches? (Just a thought - I am not sure how big of a performance impact it should have)
I of course do not know your environment, but do you actually need 20 vCPUs assigned to the VM? Keep in mind that if you assign a large number of vCPUs to a VM, the VM will wait until all pCPUs are available before getting CPU time. Obviously, if you need it then assign it, but best practice is usually to start low and monitor performance while increasing vCPUs. If you have a lot of other CPU heavy VMs on the hosts, you're definitely going to see some performance impact due to the high allocation of vCPUs to your SQL server.
Thank you for your answer jatinjsk. Yes all theses points are already addressed.
We're running benchmarks with a 20vCore test VM, on hosts with 40vCore available (no other VM on the host, so no CPU over-commitment). The test VM uses only 4Gb RAM (256Gb available).
All our hosts on the v6.5 cluster are showing the same reported behavior.
What is really annoying is we're using a CPU-bound benchmark, so except the power-management setting to high performance, we don't see any good reason for I/O or network to impact the performances (vSphere HA is disabled)
Thanks for your answer minivlab,
No we are not over-provisioning unfortunately.
We have similar production VM on the old v5.5 and the new v6.5 clusters, with similar workload.
And as I tried to show in the CPU-Z screenshots, on the v5.5 cluster the "multi thread ratio" is way better than on v6.5.
That's really the behavior we struggle to have on 6.5 : running on all the host pCores, on hosts where no other VM is present.
More than just the benchmark, we experience really high CPU ready values while on high workload on the new cluster, while everything is fine on the old one...
We need 20vCPU in order to have room for peak hours on our databases. (theses are mission critical VMs, so we expect to run them as fine as possible)
Usually Database performance problems are about not having enough RAM and I/O problems. Sometimes less CPUs perform better than more. Is this all defined all as 1 socket or many sockets? It might be a NUMA sizing issue.
BenediktFrenzel many thanks for the video, I watched some more on videos.vmworld.com/global, lot of interesting things.
In particular the point about balanced power management and TurboMode, very important since on our xeon, turbo mode can be activated when 10 cores are used.
This evening, I have made an attempt to switch to High Latency mode our production VM, with 120Go VM RAM reservation, we will see if things improve, but for now, I have a 18.2 Multi Thread Ratio which is amazing (with score above 8200, while I was having multi-thread scores between 5000 and 7000 the last week). I will continue to report the results in this thread later.
About the VM hardware version, good advice ! there's something ! The production SQL VM is still in ESXi 6.0 mode (v11) (after migration), I will try to change it to v13 later this week. But I doubt there's an impact, since the test VM was in v13, and I was reproducing the same behavior.
First results are really promising !!!
Latency high setting with memory reservation is the real game changer setting !!!
Guest CPU : on the left is monday : we hit the sky (97% CPU). On the right is today : smooth and as before on vSphere 5.5
And the CPU Ready value just disappeared !!!
We still have to adjust the numa.autosize.vcpu.maxPerVirtualNode setting because actually the guest see only one vNuma (not two), we hope to benefit from SQL soft numa feature.
I will report later for the conclusion, but we are back in normal operations, thanks to all !!!