What is the recommendation for HT with NUMA to keep the HT disabled or enabled?
There's nothing indicated in the vsphere best practices guide to indicate that
HT needs to be disabled, if you review pg 22 of the guide http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
it does menton a scenario which requires tweaking the
vcpu.preferHT flag but it addresses a v specific circumstance
Hi
Welcome to the communities.
I will go with Disable the HT option .
I was also thinking of that?
Why would you disable HT? All the benchmarks show that modern HT, for 95% of workloads, improves performance...
Because numa is not considering HT option.
Just to make things clear, I suppose with NUMA you mean "non uniform memory access" and with HT "hyperthreading". If it is so, then what has NUMA to do with HT? From NUMA point of view, it does not matter if real core (cpu) or "hyperthreaded-core" needs data from memory. The only thing which *does* matter: is memory-page requested in "local" (directly accessible) or "non-local" (indirectly accessible) memory bank?
HT can have (and probably has) some impact on frequency of "cache misses" (2 threads still share the same amount of cpu-cache) and this might increase number of memory-pages requested, but this effect is very small and outweighed by benefits of running 2 threads in parallel.
Or I put it other way: if you do not have problems with HT on UMA (unified memory architecture), you will very probably do not see problems even on NUMA (and vice-versa)...
Then what is the meaning of this:-
During placement of a vSMP virtual machine, the NUMA load balancer assigns a single vCPU per CPU core and “ignores” the availability of SMT threads
It does not matter which thread of the same core (or cpu) places request on data from memory, because both threads of the same core have the same affinity to particular memory-bank, share the same L1/2/3 cache, etc. From NUMA point of view they are equal. That's the reason why NUMA does not need to have list of "cpu-threads", it is enough to keep list of cores.
You might ask "why is it then not enough to keep just list of cpus"? Because cores inside of the same cpu might be organised in complex way, clustered, or multi-layered, sharing or not sharing common cache, etc. (i.e. "bulldozer" microarchitecture, 8 cores in 4 clustered modules). Then 2 cores of the same cpu-cluster are for NUMA not the same as 2 cores from different cpu-clusters (but from the same physical cpu)...