VMware Cloud Community
manajoe
Contributor
Contributor

ESXi 6.0U2 imbalanced NUMA nodes

Hello,

I’m not sure, if I understand the NUMA thing correctly. I have the following situation:

-    HP DL380 ESXi 6.0U2 hosts with 2 Intel pCPUs with 12 pCores each = 24 pCores per host, HT enabled

-    256 GB RAM per host

-    VMs with 6vCPU, 32 GB vRAM per VM (virtual Xenapp-Servers, Win 2012 R2)

As far as I found out, my ESXi hosts have 2 NUMA nodes. Now I’m trying to figure out, how many VMs per host will perform best.

My understanding was, if I place 4 of these VMs on one host, both NUMA nodes would be used. But this doesn’t seem to be the case:

numa_stats_4_vms_per_host.png

Most of the time, all VMs are placed on one NUMA node, in this case the N%L seems to be quite good, %RDY also. What I randomly see are short switches of a VM to NHN 0, with decreasing N%L, but a minute later, it’s moved back to NHN 1.

So I tried with 5 of these VMs. Initially after vMotion, both NUMA nodes were used, but minutes later, all VMs had been migrated to NHN 0(!!), with lower N%L values and %RDY values getting worse:

numa_stats_5_vms_per_host.png

From my understanding, I expected both NUMA nodes to be used, not necessarily 50%-50%, but definitely better than 100%-0%.

Can anyone guess why there is this imbalance?

3 Replies
Rubeck
Virtuoso
Virtuoso

Can anyone guess why there is this imbalance?

Maybe the "Action-affinity-based" migration is the one causing this behavior....?

See VMware KB: NUMA nodes are heavily load imbalanced causing high contention for some virtual machines

/Rubeck

0 Kudos
Bleeder
Hot Shot
Hot Shot

0 Kudos
engineer4kailas
Enthusiast
Enthusiast

Rubeck

Thanks for sharing the good article .

I am also facing the exact same issue \

Should I apply this setting to all Host in infra to avoid any issue in future or only to problematic Host this time and wait to occur this issue on other.

whats is your opinion ?

0 Kudos