idanshr
Contributor
Contributor

How to create total CPU isolation and thread pinning ?

Hi all, i am new on this forum, so would i appreciate any help i can get with the following issue:

I am trying to create a network forwarding application that runs on a VM (on a single physical server), using 10Gbps NICs configured in "pass through" mode".

The problem is that this application should answer the following demands:

     1 - very low drop rate

     2 - low latency

     3 - low jitter

I have tried to use "CPU affinity" setting to allocate specific hw threads to my VM, but as i can see in ESXTOP (please see two attachments below) those HW threads are constantly moved by the scheduler to handle different vCPUs of my VM, in addition, other VMs also can use the same hw threads that i have allocated to my VM, unless they are specifically configured not to.

Does anyone knows how can i create a total CPU isolation and pinning, meaning that each vCPU of my VM will be always handled by a constant (and single) hw thread and will not be moved by the scheduler to other hw threads ?

Another question: is there an easy way to isolate my VM from other VMs on the same physical server, without going though all the VMs and manually set their affinity ?

Thank you very much in advance Smiley Happy

1.png

2.png

5 Replies

Potential Issues with CPU Affinity

Before you use CPU affinity, you might need to consider certain issues.

Potential issues with CPU affinity include:

For multiprocessor systems, ESX/ESXi systems perform automatic load balancing. Avoid manual specification of virtual machine affinity to improve the scheduler’s ability to balance load across processors.

Affinity can interfere with the ESX/ESXi host’s ability to meet the reservation and shares specified for a virtual machine.

Because CPU admission control does not consider affinity, a virtual machine with manual affinity settings might not always receive its full reservation.

Virtual machines that do not have manual affinity settings are not adversely affected by virtual machines with manual affinity settings.

When you move a virtual machine from one host to another, affinity might no longer apply because the new host might have a different number of processors.

The NUMA scheduler might not be able to manage a virtual machine that is already assigned to certain processors using affinity.

Affinity can affect an ESX/ESXi host's ability to schedule virtual machines on multicore or hyperthreaded processors to take full advantage of resources shared on such processors.

Aa far as I know, if u even assign CPU affinity and dedicate particular cores to one VM (VM1), other VMs (VM2,3,n) might use the same core which was assigned to VM1 as CPU affinity. so I would suggest try to disable the DRS and test again.

0 Kudos
ssstonebraker
Contributor
Contributor

Turn on high CPU latency sensitivity, pin the memory

*NOTE*  On Line 1, replace *SQL* with whatever pattern vm name you want to be changed (e.g. *myvm* would pick up 'myvm1', 'myvm2', 'test-myvm5')

    $VMS = (Get-VM  |  Where-Object {$_.Name -like "*SQL*"})

    foreach ($VM in $VMS) {

    New-AdvancedSetting -Entity $VM -Name 'sched.cpu.latencysensitivity' -Value 'high' -Confirm:$false -Force:$true

    New-AdvancedSetting -Entity $VM -Name 'sched.mem.pin' -Value 'true' -Confirm:$false -Force:$true

    }

Source:

http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf

https://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf

Description of High CPU Latency Sensitivity

New in vSphere 5.5 is a VM option called Latency Sensitivity, which defaults to Normal. Setting this to High can yield significantly lower latencies and jitter, as a result of the following mechanisms that take effect in ESXi: • Exclusive access to physical resources, including pCPUs dedicated to vCPUs with no contending threads for executing on these pCPUs. • Full memory reservation eliminates ballooning or hypervisor swapping leading to more predictable performance with no latency overheads due to such mechanisms. • Halting in the VM Monitor when the vCPU is idle, leading to faster vCPU wake-up from halt, and bypassing the VMkernel scheduler for yielding the pCPU. This also conserves power as halting makes the pCPU enter a low power mode, compared to spinning in the VM Monitor with the monitor_control.halt_desched=FALSE option. • Disabling interrupt coalescing and LRO automatically for VMXNET 3 virtual NICs. • Optimized interrupt delivery path for VM DirectPath I/O and SR-IOV passthrough devices, using heuristics to derive hints from the guest OS about optimal placement of physical interrupt vectors on physical CPUs. To learn more about this topic, please refer to the technical whitepaper: http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdfhttps://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf

idanshr
Contributor
Contributor

Hi dhanarajramesh, thanks for your reply, i am not working with a cluster, i have a standalone server that runs ESXi and i want to deploy number of VMs on it, one of the VMs will run a traffic forwarding application that is sensitive to delay and jitter, therefore i have to reserve pCPUs for specially for that VM and somehow make the hypervisor NOT to use those pCPUs for other VMs other than the network forwarder.

Thanks again Smiley Happy

0 Kudos
idanshr
Contributor
Contributor

Hi ssstonebraker, thanks a lot for you reply, i have set both fields that you have mentioned (sched.cpu.latencysensitivity and sched.mem.pin) but it didn't work as expected, i did some additional reading using the links you have attached and couple more references and i noticed some additional fields that should be set (all can be found at http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf on page 7 and then in "Appendix" section)

     1) monitor_control.halt_desched = "false"

     2) monitor.idleLoopSpinBeforeHalt = "true"

     3) monitor.idleLoopMinSpinUS = 100

In addition, i tried to play with "Resource" tab in the VM settings and change the "CPU shares" to "Custom" value of 128000, and set the "Reservation" to 8000 MHz and "Limit" to "Unlimited"

I also had to manually go over all VMs other than my network forwarding VM and set their CPU affinity to exclude the pCPUs i have allocated to network forwarding VM in order to improve the scheduling.

BUT even after all that, i still noticed that the scheduler moves the vCPUs from one pCPU to another, it's definitely much better than before, meaning that the "thread moving around" is smaller than before but it's till occurs, and vCPUs are not perfectly "pinned" to pCPUs as you can see in the attachments below.

I would very much appreciate more ideas Smiley Happy.

Again, thank you very much for pointing me in the right direction.

Idan.

2.1.png

2.2.png

0 Kudos
ssstonebraker
Contributor
Contributor

Hi Idan,

I'm glad you made some progress.  Be sure you are aligning with NUMA nodes if you are using virtual machines with a large number of cores.

Also are you using hyperthreading?  Do you have more VMs allocated to the host than available cores? 

Look at your diagram the pCPU is only moving between CPU 19 and 31.   How many cores do you have per socket?  If the answer is twelve than maybe you are staying within a single processor?

Are you reserving memory?  How is the memory paced on the motherboard?  Is is distributed evenly per socket?  Check your bios on the host and make sure power saving mode is off.

0 Kudos