VMware Cloud Community
jrmunday
Commander
Commander

Performance degredation related to CPU Scheduling and NUMA nodes.

I have an interesting scenario (HP vs DELL hardware) with potentially degraded performance (specific to the DELL R815 hardware) and I would like to know if what I am seeing is being interpreted correctly, or whether am I simply being over cautious and don't actually have an issue.

Summary;

  1. Although a much higher hardware specification, the DELL R815 ESXi hosts are not scheduling the CPU cycles as efficiently as the HP DL585 G6 hardware. The impact we are seeing is an increased CPU ready time and performance degradation of the guest VM’s. This is evident with a very low number of guest VM’s on the host and increases as the consolidation ratio is ramped up or the CPU load is increased on any of the guest VM’s.
  2. There also appears to be an imbalance in the NUMA nodes where a particular node is favoured and the % NUMA local memory is not as efficient as it should be (ie. the HP hardware performs much better than the DELL hardware)

DELL Technical Details;

Hypervisor       : VMware ESXi 4.1.0, build 582267

Hardware specification;

Dell PowerEdge R815

- Model : AMD Opteron(tm) Processor 6174

- Processor Speed : 2.2 GHz

- Processor Sockets : 4

- Processor Cores per Socket : 12

- Logical Processors : 48

- Memory : 256 GB

esxtop performance statistics;

DELL Memory (incl NUMA statistics);

DELL_Mem_NUMA.png

Dell CPU;

DELL_CPU.png

Observations;

  • NUMA home node #7 is favoured, rather than balancing the load across all 8x nodes
  • % NUMA local memory is inefficiently allocated
  • Very low consolidation ratio of guest VM’s per host
  • Very low load on the host and already seeing ready time

Example of the affected Guest VM

Guest_VM.png

DELL Host is under no load whatsoever;

DELL_Resource.png

As a contrasting perspective from a heavily loaded HP DL585 G6 host, this is what I would “expect” to see;

HP Technical Details;

Hypervisor       : VMware ESXi 4.1.0, build 582267

HP Hardware specification;

HP ProLiant DL585 G6

- Model : Six-Core AMD Opteron(tm) Processor 8435

- Processor Speed : 2.6 GHz

- Processor Sockets : 4

- Processor Cores per Socket : 6

- Logical Processors : 24

- Memory : 128 GB

esxtop performance statistics;

HP Memory (incl NUMA statistics);

HP_Mem_NUMA.png

HP CPU;

HP_CPU.png

Observations;

  • HP host is of a lower hardware specification than the DELL host
  • HP host has almost 4x the number of guest VM’s hosted and does not suffer from the same performance issues
  • NUMA home node #0 is favoured, but there is a much better allocation of NUMA local memory (more efficient) – close to 100%
  • Much higher consolidation ratio of guest VM’s per host without performance issues
  • Much higher load on the host and almost ZERO ready time

HP host still has capacity, but is under much more load the than the affected DELL host;

HP_Resource.png

In both cases (HP and DELL) we do expect to see a certain level of ready time, but the levels seen on the DELL hardware are of concern, as well as the inefficient use of NUMA local memory. This issues is not seen on the HP hardware, including earlier and later generation hardware.

So the questions are;

  1. Have I interpreted this correctly?

  2. Has anyone else see this before? If yes, how was this resolved?

  3. What next steps can be taken to test and verify this information

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
32 Replies
jrmunday
Commander
Commander

I logged a support request to VMWare and DELL regarding this issue, and VMware have been able to replacate this internally and confirmed that it's a bug. They are currently working on a fix, which should hopefully be released with U3.

In the interim, I have upgraded to vSphere 5, and ESXi 5.0 U1 ... and the issue still persists. I'll be feeding this back to both vendors.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
asp24
Enthusiast
Enthusiast

I had the same problem. A temporary fix is to set "Numa.RebalanceEnable" to 0

esxi-host configuration, Software, Advanced Settings, Numa, Numa.RebalanceEnable = 0

Supermicro Opteron 61xx and 62xx hosts.

When I did this the cpu load balanced evenly between the cores and the ready times dropped to 0.XX as was expected.

0 Kudos
AKostur
Hot Shot
Hot Shot

How about installing the server via PXE boot?

0 Kudos
asp24
Enthusiast
Enthusiast

My solution will work. Just change the setting, and you will see the result in ESXTOP after a few seconds. No reboot needed.

0 Kudos
jrmunday
Commander
Commander

Hi asp24, Thanks for the feedback. I'm on a training course this week, so will test this next week.

As I understand this advanced setting, it will disable NUMA rebalancing completely. I assume you are using this as a short term workaround and not a long term fix?

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
rldleblanc
Contributor
Contributor

jrmunday wrote:

I logged a support request to VMWare and DELL regarding this issue, and VMware have been able to replacate this internally and confirmed that it's a bug. They are currently working on a fix, which should hopefully be released with U3.

In the interim, I have upgraded to vSphere 5, and ESXi 5.0 U1 ... and the issue still persists. I'll be feeding this back to both vendors.

We are seeing almost 6x ready time from ESXi 4.1 to ESXi5.0U1 for the same exact workload on idential hardware with identical settings. I am seeing similar NUMA issues, but we only have two nodes so it not as bad. Have you submitted an SR for what you are seeing? I would like to reference it in my SR, maybe there is a correlation. The Numa.RebalanceEnable did not improve things for me at all.

0 Kudos
jrmunday
Commander
Commander

Hi Robert,

Yes, the VMware Support Request number is 12146638002.

VMware have confirmed in writing that a fix for this is targeted September this year. Unfortunately I can't wait this long, and will be pushing for a fix before then. See my latest correspondence below;

--------------- Original Message ---------------

From: VMware Technical Support [webform@vmware.com]

Sent: 03/05/2012 15:58

Subject: RE: VMware Support Request 12146638002

There is a process to supply Hot Patches, these are Patches that go through testing at the the build you are currently running at on a similar H/W. This is not obviously as extensive as a full tested Patch or Update release.

If you want to go ahead with this process just capture a Support Bundle from one of the Esxi5 Servers and I will open the request.

--------------- Original Message ---------------

From: VMware Technical Support [webform@vmware.com]

Sent: 27/04/2012 18:30

Subject: RE: VMware Support Request 12146638002

Hi Jon,

The fix is now being targeted for the patch release due in September, let me know if this  is ok for you?

-----Original Message-----

From: VMware Technical Support [mailto:webform@vmware.com]

Sent: 26 April 2012 11:07

Subject: RE: VMware Support Request 12146638002

There has been some progress overnight, although the code fix for this is involved and touches portions of the vmotion, cpusched and numasched code, Engineering are currently reviewing the possibility of targeting the fix in a Patch Releases. Once I have a further update I will let you know.

I haven't tried the "Numa.RebalanceEnable" setting yet as I don't want to affect my 24x HP hosts which are working perfectly fine.

Hope you get the issue resolved.

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
lakey81
Enthusiast
Enthusiast

Jon,

I am also experiencing the same issues with node balancing but on HP BL685c G6's with 4x6 processors, 128GB of ram, and ESXi 4.1 build 582267.  To try to alleviate the problem I've moved enough VMs off one host so that there are more physical cores than vcpus and it still is not load balancing them properly.  In that configuration sometime it does balance the VMs out over all 4 nodes and then CPU %RDY times are great but the majority of the time it is constantly moving VMs between nodes and creating high CPU %RDY times.

I also have a case open with support but have not received a good answer as of yet.   If you hear anything else please let us know and I will also!

Ryan

0 Kudos
jrmunday
Commander
Commander

I have patched all 34 of hosts upto the current release, ESX 5.0.0 build 702118, and submitted the support bundle to VMware today. They will base the hot patch on this build and aim to have an ETA for releasing this by the end of this week.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
rldleblanc
Contributor
Contributor

We have generated a CPU workload similar to what we are seeing in production, but with much less RAM used. As a result, the ready times of the VMs are much lower on ESXi 5.0 as expected. This makes me think that the problem is not with the CPU scheduling. Our next focus is to increase the RAM usage of the simulated workload and see if the ready times go up. If that happens, then it would seem to point to NUMA even though we only have two nodes an a pretty good locality of RAM for the VMs. I'll post the results of our testing.

0 Kudos
rldleblanc
Contributor
Contributor

I have a question for those on the thread. Do you have quantitative evidence that VM performance is lower inside the VM with this problem? We stepped out on the plank last night and put 150% of the MSSQL VMs that we were running on ESXi 4.1 onto an ESXi 5.0U1 host (same exact hardware configuration between both) and we saw no performance degradation inside the VMs, in fact, that workload drastically out performed the lesser load on ESXi 4.1.

I have written a random load generator that mimics our live workload, in that program I've added an operations/second metric that we will use to see if the VMs suffer performance degradation on ESXi 5.0U1 compared to ESXi 4.1. If the ops/sec is equal to or greater on ESX 5.0, then I believe there is a bug in the ready time performance metric that the hypervisor is reporting and to a large degree should be ignored until fixed. There is still the possibility that the NUMA locality may affect performance, but if this bug is verified, ready time should not be used as an indicator of poor performance until fixed.

0 Kudos
jrmunday
Commander
Commander

Small update ... We decided to delay the hot patch from VMware for our current build and will now receive this as part of the scheduled July patches to ensure that we are able to patch between now and September (if required). With this delay, we are still a few weeks away from seeing any fix in place.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
kpc
Contributor
Contributor

Hi guys

I'm having a similar problem with two new R815's that we recently purchased.   Specs are AMD Op 6276, 4 CPU, 128MB ram, running ESXi 5.0.0 623860

I have other R715's AMD Op 6136, 2 CPU running in live running 469512 without  any problems.

I first noticed an issue when I deployed a simple XP x32 to the R815 boot up  time was nearly 1 minute, this compared to 20 seconds on the R715.  Even  clicking on the Power button is slower, on the R715's the status goes from 20%  to Completed, on the R815's it's goes 20% 40% 60% 80% Completed!  Moving around  the VM on the R815 seemed sluggish.  Both servers empty and tested using local  and SAN storage.

I tried the NUMA setting fix but this did nothing.

Re-installed ESXi 5 469512 but the same problem exists.  These R815 are the  first 4 CPU server's we've used.

Anyone had any success with this, I'm ready to call it into support.

Thanks

0 Kudos
jrmunday
Commander
Commander

Hi kpc,

I would definitely log this to support as it is a know issue that VMware have been able to replicate internally. I suspect the more cases they have open the better exposure it will get internally. I still have an open support case and am in regular contact with them about this, and from my open case they have confirmed that they plan to release the general fix for this issue in U2, currently planned for September 2012.

VMware have also finally released a hot patch to address this issue, and although I received a week ago, I will only be able to do the testing over the next week or so. The reason for this delay is that the patch is specifically tested upto June patch level and since I am already on July patches this is obsole for my environment - I need to rebuild some hosts to an earlier level before testing.

I will post my results here once complete.

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
kpc
Contributor
Contributor

Thanks Jon

I've logged this with support, if I get the hot fix I'll test straight away as I have empty servers sitting here.  Will let you know how it goes.

Cheers

Pete

0 Kudos
rldleblanc
Contributor
Contributor

This may not be directly related, but have you checked the BIOS performance setting? We have found that our new servers came with the BIOS set to power save and when we changed it to OS control, we got much more repsonsiveness out of the VMs and the hypervisor when adjusting that setting.

0 Kudos
jrmunday
Commander
Commander

Hi Robert,

Yes, I 've been through the BIOS settings and tweaked everything recommended from both DELL and VMware. We also turn off all power saving features as standard on all of our hosts.

See URL below;

http://en.community.dell.com/techcenter/virtualization/w/wiki/dell-vmware-vsphere-performance-best-p...

There are however two settinggs that I need to test as I've read conflilcting recommendations regarding them. They are the C1E setting and DMA Virtualization. I would be interested if you have any experience or recommendations regarding these settings.

Thanks,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
jrmunday
Commander
Commander

Hi Pete,

I will rebuild a host today and hopefully post test results (time permitting).

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
jrmunday
Commander
Commander

The results post hot patch are extremely favourable  -see below;

Post Hot Patch - Memory (almost all 100%);

PostHotPatch-Memory.png

Post Hot Patch - CPU (3x the previous load with none of the observerd ready time);

PostHotPatch-CPU.png

I will however need to do some more thorough testing as I made quite a few changes leading upto this;

  • Upgraded all firmware to latest version. There are a number of new releases since this issue was first noticed, most notably a new BIOS version.

Dell-R815-Firmware-Versions.png

  • Reviewed the BIOS setting and turned off all non required features, example USB, serial, etc. See attached - these are specific to the dell R815, running BIOS version 2.8.2.

  • Rebuilt host to June patch level + Hot Patch (missing July patches below);

Host-Patch-Level.png

The results so far warrant me rebuilding all 34 of my hosts, but I will do some more testing before making a decision.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos