3 Replies Latest reply on Nov 11, 2019 11:06 AM by JStars

    numa.autosize.vcpu.maxPerVirtualNode lack of info

    JStars Enthusiast

      I have a VM with 64 vCPUs and 512GB RAM, it’s a massive db. ESXi 6.5 are HP DL560 with 4 sockets (22 cores each + HT) and 1.5 TB RAM. According to Frank Denneman's book vSphere 6.5 Host Resources Deep Dive I aimed at keeping the VPD onto as little psockets as possible. By disabling the Hot add CPU feature and by adding the preferHT set to True I increased performance quite a lot and I expected to see cores spread onto two physical sockets. However while the VMWare KB 2003582 states how to implement the preferHT setting it does not mention something Frank Denneman did say in his book:

      Quote:

      “Please remember to adjust the numa.autosize.vcpu.maxPerVirtualNode setting in the VM if it is already been powered-on once. This setting overrides the numa.vcpu.preferHT=TRUE setting”

      End quote

       

      I read the above after I did the initial changes to the VM and I have now noticed that its numa.autosize.vcpu.maxPerVirtualNode value is 11. According to Virtual NUMA Controls I should get 6 virtual nodes by dividing 64 by 11,but I see the VM has 7. This is another thing I don't understand.

      following which criteria do I adjust numa.autosize.vcpu.maxPerVirtualNode value?

      Shall I set it to 44 as it is the max number of logical cores in a physical socket? Or shall I disable it and let the system do its best decision? If yes how do I disable it? This is the current layout of the cpu resources of the vm:

       

      although performances have improved I’m not happy with the distribution of the cores. Specially considering that homeNode 3 is not used at all.

       

       

      So, to recap my question to the experienced admins are the following:

       

      1. following which criteria do I adjust the numa.autosize.vcpu.maxPerVirtualNode value so that the preferHT setting is enforced correctly?

      2. I knew that in 6.5 the coresPerSocket setting was decoupled from the Socket setting, so it does not really matter anymore if you set 12 sockets x 1 core or 1 Socket x 12 cores (unless licenses rectrictions are in place). However in Frank Denneman's book I read:

      quote

      "If preferHT is used, we recommend aligning the cores per socket to the physical CPU package layout. This leverages the OS and application LLC optimizations the most "

      end quote

      So, in this case the use of CoresPerSocket is effective? Then I should set 2 Sockets x 32 coresPerSocket? Option that frankly I haven't seen available in the VM Settings window

      3. Why the VM has 7 virtual nodes instead of 6?

        • 1. Re: numa.autosize.vcpu.maxPerVirtualNode lack of info
          JStars Enthusiast

          on vmware docs pages it says:

           

          numa.vcpu.maxPerVirtualNode

          Determines the number of virtual NUMA nodes by splitting the total vCPU count evenly with this value as its divisor.

          so does it mean I need to adjust its value to the number of pcores in a psocket? Or logical cores in a psocket?

          • 2. Re: numa.autosize.vcpu.maxPerVirtualNode lack of info
            FDenneman01 Novice
            VMware Employees

            PreferHT is used to consolidate the vCPUs as much as possible and create the fewest number of NUMA clients. In your situation, there are four NUMA nodes, 22 cores with each 384 GB of memory (assuming the DIMMs are equally distributed across sockets and offer the same capacity). With PreferHT, the NUMA scheduler takes the SMT capabilities into account, and therefore, each NUMA node can now accept NUMA clients similar to the HT thread count. In your situation, that is 44.

            With this theory, your VM of 64 vCPUs should be distributed across two NUMA clients, each NUMA client grouping 32 vCPUs.

             

            The advice of setting numa.autosize.vcpu.maxPerVirtualNode in the book is to propagate the virtual NUMA topology to the guest OS. This is typically recommended for VMs that exceed the memory capacity of a NUMA node while being able to all the vCPUs in a single NUMA node. In your scenario, setting it to 11 restricts the scheduler to prefer HT threads for sizing the NUMA client. 64/11 = 5.8; thus, it should round up to 6, but it doesn't do this, so I expect other advanced settings are influencing the NUMA client configuration.

             

            Instead of diving into and wasting much time on figuring out this anomaly, my recommendation is to remove the setting numa.autosize.vcpu.maxPerVirtualNode, and allow the NUMA scheduler to align the NUMA client configuration to the PreferHT setting. Typically most customers do not use PreferHT on a virtual machine that has that many CPUs. PreferHT is an advanced setting designed to help workloads that are cache-intensive, but not CPU intensive. By scheduling all the vCPUs into the same NUMA node, it can leverage the same L1/L2/L3 cache and thus take advantage of the spatial locality of memory access. I suspect, you assigning 64 vCPUs is to have a lot of CPU resources available for the application.

             

            We decoupled the NUMA scheduling constructs from the user setting Cores per Socket (CPS). Thus the setting provides customers to align with licensing requirements while the NUMA scheduler can manage its constructs to optimize memory access. When setting it, it provides a socket topology, and the Guest OS translates that as cache scheduling domains. In your situation, with 64 vCPUs with PreferHT enabled and maxPerVirtualNode disabled, you should get two NUMA clients with each containing 32 vCPUs. Your CPS setting should be 32 cores per Socket. That will present a dual-socket configuration, and your Guest OS will create a Cache map, where 32 CPUs share the same cache. The application and Guest OS can now determine where to place the threads when they expect these threads to share and access the same memory blocks.

             

            Hope this clarifies it for you, if you decide to apply my recommendation, please test it first in a non-production environment and post the results as others might experience a similar situation and are helped by your experience.

            • 3. Re: numa.autosize.vcpu.maxPerVirtualNode lack of info
              JStars Enthusiast

              Frank many many thanks for your time on this! Really appreciated!

              Now I know I can simply delete the numa.autosize.vcpu.maxPerVirtualNode in favour of the preferHT. Since it is not clear why the numa clients count does not reflect the actual settings, I agree with you that is a waste of time investigating further. So I will proceed with your suggestion and report back to the community as soon as I get the green light for the next shutdown. Unfortunately for this very application I don't avail of a test environment. Thanks also for the hint on the preferHT parameter. I chose it so vCPUs could stay on as less physical sockets as possible, else the other solution would have been to have the vCPUs spread over 4 physical sockets, like 4 vSockets x 16 cores and frankly I don't know if that is a good idea. Also I will reserve VM's RAM fully to further improve performance.

              Big thanks again!