I have 5 esxi nodes. All with 5.1 update 2.
I'm having issues with NUMA scheduler. Poor balance.
Let me go directly there.
I'm playing (doing tests) now with one node. Just one.
Dell r910, 4 opteron --> 4 numa nodes: each --> 6 cores + 64 GB ram. Total: 24 cores + 256 GB ram.
10 VMs. VMs cores: 4,4,4,4,2,2,1,1,1,1, respectively. Very well sized. Each one of them use 80-90% of those vcores.
No under or oversized situations. Memory, between 1 and 16 GB. No problem with memory. The issue is strictly CPU related.
Ok. ESXTOP + m + f. Numa statistics.
Numa nodes and cores (VMs)
0 4,4,1 9 cores (!!). Terribly balanced. The 3 VMs have high CPU ready times.
1 2,2,1,1 Completing numa core count. 6 cores. Ok here.
2 4,1 5 cores. Ok here.
3 4 4 cores. Ok here.
I waited for an entire day and the VMs stay there. No new rebalance. Nothing.
So, i fix it manually. I move the VMs between nodes using resource settings --> advanced memory / cpu (specifying numa node and cpu affinity*).
* Fact : I've read on the official documentation that specifying only the numa node on advanced memory, does not work. You need to specify the CPU affinity too.
So, for example, for numa node 1, cpu's are: 6,7,8,9,10,11. I specified: 6-11, which is the same
The VMs move instantly.
Result on esxtop:
Excellent. That's balance. VMs on each numa node; completing the 6 cores per node.
Yes, of course: memory locality 97-100%. Every time. No problem there, like i remarked at the beginning.
CPU ready time dropped to 0-50 ms on every VM. Perfect. Before, we were talking about 2000 ms - 5000 ms (!!!).
I've read that once a VM is part of a new ESXi host (by automatic vmotion for example), the scheduler considers the VM memory.
It puts the VM on the first numa node which has enough free memory to hold the VM. That's all.
It does not care about the core amount. That can deliver poor CPU performance in the short term.
Now, after one hour, i removed, from each VM, every setting from advanced settings (affinity matters).
After another hour, i checked CPU ready times on each VM. All doing fine, but 2.
I went to ESXTOP. AGAIN. NUMA nodes IMBALANCED.
One of the numa nodes had enough VMs for 7 cores and another, 8 cores.
So. What i'm doing right now and from now on?
I do manual balance and then --> ESXi host --> software --> advanced settings --> numa.RebalanceEnable=0.
The VMs stayed right on the numa node i put them.
Excellent CPU ready times up to now.
1) Is there a way to fix this using one or more of the NUMA advanced attributes? I want the VMs to be positioned on each of the NUMA nodes, taking as reference / considering each VM core count, too, not only the memory!! It' s obvious and essential !!! Otherwise, you experience the obvious bridge cross (that's how i call it) between physical cores; adding latency. Instant latency. I want each VM to stay on one numa node. No remote memory or remote CPU !!
2) Is this, in some way, totally fixed on vSphere 5.5 ? Is numa balancer/scheduler better? is quite frightening.
ps: The "again" on the subject is version related. I've seen NUMA poor balancing issues on other discussion threads, for vSphere 4.1 and 5.0.