complexxL9
Contributor
Contributor

Assign number of vCPUs to VM equal to cores in NUMA node?

Jump to solution

Hello,

I was reading http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf

and I would like to clarify this point on page 44 regarding vCPUs per VM:

Size your virtual machines so they align with physical NUMA boundaries. For example, if you have a host

system with six cores per NUMA node, size your virtual machines with a multiple of six vCPUs (i.e., 6

vCPUs, 12 vCPUs, 18 vCPUs, 24 vCPUs, and so on).

So is this correct:

On one ESXi host with 1 socket and 4 cores (1 NUMA node) I intend to run 5 VMs. 2 of those VMs will be CPU intensive and 3 not CPU intensive.

Nevertheless assigning 4 vCPUs to every VM (instead of assigning 4 vCPUs to resource intensive VMs and 1 vCPU to non intensive VMs) would result in better performance for all 5 VMs?

In general distributing CPU resources across those VMs should be done via shares and reservations instead of assigning different vCPU numbers?

1 Solution

Accepted Solutions
Alistar
Expert
Expert

Hi there,

it is an interesting notation from that document, but general rule of thumb is size the Virtual Machines as you see fit on single-processor systems favoring "Cpu Socket Count" while setting vCPUs. NUMA nodes come into play with multi-processor systems where the hypervisor needs to schedule CPU/memory touching for nodes. This is set automatically via vNUMA when you still maintain the ratio 1:1 Sockets/Cores while setting the vCPUs.

This per-socket core-ratio comes into play you are constrained by the OS licensing - for example you want 8 cores on a Windows Server Standard license which only supports up to 4 sockets and you have 2-CPU system, you would set 2 Sockets (because you physically have those) and 4 cores for each - this way the hypervisor will schedule it to (ideally) run on each socket with 3 cores, each touching its memory node. This gets more complex with many-noded systems.

The worst thing you can do is overprovision your VMs because the ESXi CPU scheduler needs to have the amount of total vCPUs you have specified all at once on the physical CPU. This means the CPU would never co-schedule VMs in parallel, but rather serially - one VM in one clock cycle, which would have a negative impact on performance - in my opinion. Start with 1vCPU on non-intensive and 2vCPU on intensive workloads and see how your performance is, if you have time and resources to test the performance out. Also, keep in mind that shares come into play only when the ESXi host is facing resource contention.

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/

View solution in original post

0 Kudos
2 Replies
Alistar
Expert
Expert

Hi there,

it is an interesting notation from that document, but general rule of thumb is size the Virtual Machines as you see fit on single-processor systems favoring "Cpu Socket Count" while setting vCPUs. NUMA nodes come into play with multi-processor systems where the hypervisor needs to schedule CPU/memory touching for nodes. This is set automatically via vNUMA when you still maintain the ratio 1:1 Sockets/Cores while setting the vCPUs.

This per-socket core-ratio comes into play you are constrained by the OS licensing - for example you want 8 cores on a Windows Server Standard license which only supports up to 4 sockets and you have 2-CPU system, you would set 2 Sockets (because you physically have those) and 4 cores for each - this way the hypervisor will schedule it to (ideally) run on each socket with 3 cores, each touching its memory node. This gets more complex with many-noded systems.

The worst thing you can do is overprovision your VMs because the ESXi CPU scheduler needs to have the amount of total vCPUs you have specified all at once on the physical CPU. This means the CPU would never co-schedule VMs in parallel, but rather serially - one VM in one clock cycle, which would have a negative impact on performance - in my opinion. Start with 1vCPU on non-intensive and 2vCPU on intensive workloads and see how your performance is, if you have time and resources to test the performance out. Also, keep in mind that shares come into play only when the ESXi host is facing resource contention.

Stop by my blog if you'd like 🙂 I dabble in vSphere troubleshooting, PowerCLI scripting and NetApp storage - and I share my journeys at http://vmxp.wordpress.com/

View solution in original post

0 Kudos
larstr
Champion
Champion

complexxL9,

You should only configure each VM with the number of vcpus it really needs. Giving idle vcpus to the VM still adds more overhead to the system and even with relaxed co-scheduling it will affect the overall performance if you give all your VMs more vcpus than needed.

The point of the NUMA node cpu count is that memory sockets are owned by a physical cpu in the server. If a VM either have more vcpus than cores on a single socket or more memory than installed for a single socket you will get a VM that is larger than a NUMA node. This is what we call a Monster VM. Access to memory owned by the other cpu is slower than access to memory owned by the local cpu, and this is the main reason you should try to keep your VMs within one package.

Lars