HT with NUMA

RanjnaAggarwal · ‎04-13-2013

What is the recommendation for HT with NUMA to keep the HT disabled or enabled?

Regards, Ranjna Aggarwal

JCMorrissey · ‎04-13-2013

There's nothing indicated in the vsphere best practices guide to indicate that

HT needs to be disabled, if you review pg 22 of the guide http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf

it does menton a scenario which requires tweaking the

vcpu.preferHT flag but it addresses a v specific circumstance

Please consider marking as "helpful", if you find this post useful. Thanks!... http://johncmorrissey.wordpress.com/

Ethan44 · ‎04-14-2013

Hi

Welcome to the communities.

I will go with Disable the HT option .

"a journey of a thousand miles starts with a single step."

RanjnaAggarwal · ‎04-15-2013

I was also thinking of that?

Regards, Ranjna Aggarwal

mcowger · ‎04-15-2013

Why would you disable HT? All the benchmarks show that modern HT, for 95% of workloads, improves performance...

--Matt VCDX #52 blog.cowger.us

RanjnaAggarwal · ‎04-15-2013

Because numa is not considering HT option.

Regards, Ranjna Aggarwal

JarryG · ‎04-16-2013

Just to make things clear, I suppose with NUMA you mean "non uniform memory access" and with HT "hyperthreading". If it is so, then what has NUMA to do with HT? From NUMA point of view, it does not matter if real core (cpu) or "hyperthreaded-core" needs data from memory. The only thing which *does* matter: is memory-page requested in "local" (directly accessible) or "non-local" (indirectly accessible) memory bank?

HT can have (and probably has) some impact on frequency of "cache misses" (2 threads still share the same amount of cpu-cache) and this might increase number of memory-pages requested, but this effect is very small and outweighed by benefits of running 2 threads in parallel.

Or I put it other way: if you do not have problems with HT on UMA (unified memory architecture), you will very probably do not see problems even on NUMA (and vice-versa)...

_____________________________________________ If you found my answer useful please do *not* mark it as "correct" or "helpful". It is hard to pretend being noob with all those points! 😉

RanjnaAggarwal · ‎04-16-2013

Then what is the meaning of this:-

During placement of a vSMP virtual machine, the NUMA load balancer assigns a single vCPU per CPU core and “ignores” the availability of SMT threads

Regards, Ranjna Aggarwal

JarryG · ‎04-16-2013

It does not matter which thread of the same core (or cpu) places request on data from memory, because both threads of the same core have the same affinity to particular memory-bank, share the same L1/2/3 cache, etc. From NUMA point of view they are equal. That's the reason why NUMA does not need to have list of "cpu-threads", it is enough to keep list of cores.

You might ask "why is it then not enough to keep just list of cpus"? Because cores inside of the same cpu might be organised in complex way, clustered, or multi-layered, sharing or not sharing common cache, etc. (i.e. "bulldozer" microarchitecture, 8 cores in 4 clustered modules). Then 2 cores of the same cpu-cluster are for NUMA not the same as 2 cores from different cpu-clusters (but from the same physical cpu)...

_____________________________________________ If you found my answer useful please do *not* mark it as "correct" or "helpful". It is hard to pretend being noob with all those points! 😉