VMware Cloud Community
JPM300
Commander
Commander

NUMA Settings missing in VM even though NUMA is set

Hello all,


Got a wired one I can't figure out and I know its something on the tip of my figures but I just can't think of it.  I have a 5 hosts all the same make, model, and manufacture HP DL360 G8's.  1 out of the 5 hosts VM's shows the NUMA settings when you edit a VM and go into the Resouces Tab

Numa.PNG

However all the rest have the setting missing:

nonuma2.PNG

I have checked to make sure numa is enabled on the hosts with esxcli hardware memory get | grep NUMA and get the following:

numacheck.PNG

I'm not sure why that one host is working while the other are not when all the settings are the same.


Any help would be greatly appricated,

0 Kudos
12 Replies
vfk
Expert
Expert

By default NUMA is only enable when you have 8 vCPU or higher.  Are you looking at the same VM?  It seems like you might be looking at two different VM with different vCPU counts, judging by the share values.

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
0 Kudos
JPM300
Commander
Commander

Different VM, however I put it up to 8 vCPU's on the hosts that are not working and same effect, no NUMA settings, however even with 1-2 vCPU if I move this one to the host that is working, I can see the NUMA options??

0 Kudos
vfk
Expert
Expert

Interesting,  vNUMA topology is set for a given VM during the power-on.  Can you power down the VM completely and power it on again and see it settings appear?

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
0 Kudos
JPM300
Commander
Commander

Yup, same deal.  However if I move it to the host that is working the setting show up immediately.  I tested in my virtual lab as well and get the same deal, the only difference in the virtual lab is I see the CPU affinity all the time and never the Memory NUMA settings, that and its virtual so the bios probably doesn't have the ability to do it in the virtual lab.

The production hosts I've been talkinga bout another thing i've noticed is on the hosts that are problamatic are also missing the CPU affinity rules options(The Hyperthread options are there)???  I'm thinking even though ESX see's NUMA enabled there might be a bios setting missing somewhere???

All 5 hosts are the exact same model however.

0 Kudos
vfk
Expert
Expert

OK, you have done all the basic checks and requirements, dig deep, I think you might be right, something to do with BIOS.  It might not actually be enabled for the others hosts.  But then again, not sure why vSphere would detect it as you have checked.  I would be interested to know if it some settings on the BIOS.

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
0 Kudos
JPM300
Commander
Commander

Just thinking, could EVC cause this behavior?  The only other thing different out of those 5 hosts is 1 host is not part of the cluster as its used for testing.  Could the mask EVC is putting on the hosts possibly be masking the NUMA bit? which could explain why the hosts are seeing it as enabled when I check in the CLI but won't show the options due to the mask?

0 Kudos
vfk
Expert
Expert

To my knowledge, EVC only masks extended instruction sets i.e. multimedia.  Have you checked the bios and confirmed Node Interleaving is actually disabled. 

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
0 Kudos
MKguy
Virtuoso
Virtuoso

Can you post the output of the following ESXi shell commands from a working and from a non-working host?:

# esxcli hardware cpu global get

# esxcli hardware memory get

-- http://alpacapowered.wordpress.com
0 Kudos
JPM300
Commander
Commander

Host that the numba settings are not showing:

~ # esxcli hardware cpu global get

   CPU Packages: 2

   CPU Cores: 16

   CPU Threads: 32

   Hyperthreading Active: true

   Hyperthreading Supported: true

   Hyperthreading Enabled: true

   HV Support: 3

   HV Replay Capable: true

   HV Replay Disabled Reasons:

~ # esxcli hardware memory get

   Physical Memory: 137402580992 Bytes

   Reliable Memory: 0 Bytes

   NUMA Node Count: 2

Host that is working

~ # esxcli hardware cpu global get

   CPU Packages: 2

   CPU Cores: 16

   CPU Threads: 16

   Hyperthreading Active: false

   Hyperthreading Supported: true

   Hyperthreading Enabled: true

   HV Support: 3

   HV Replay Capable: true

   HV Replay Disabled Reasons:

~ # esxcli hardware memory get

   Physical Memory: 137402580992 Bytes

   Reliable Memory: 0 Bytes

   NUMA Node Count: 2

No I haven't had a chance to look at the BIOS settings yet on the ones that is being problimatic.  I am going to try today to pull down one of the hosts and see.

0 Kudos
MKguy
Virtuoso
Virtuoso

So both hosts have NUMA enabled (NUMA Node Count: 2), which would mean Node Interleaving is disabled in the BIOS as it should be.

   Hyperthreading Active: false

   Hyperthreading Supported: true

   Hyperthreading Enabled: true

The 2nd host doesn't use Hyper-Threading while it's supported by the hardware and enabled (in the BIOS?).

Make sure you have HT enabled on that host under Configuration->Processors.

-- http://alpacapowered.wordpress.com
0 Kudos
JPM300
Commander
Commander

The host that is working is the one with the HT off.  We have it off as its mostly used for stress/performance testing and SQL purposes.

Thats the only difference between the 4 hosts that don't work and the 1 that does??

Does HT have to be off maybe for the NUMA options to show up?

0 Kudos
JPM300
Commander
Commander

Just to give an update on this.  I recently setup two other hosts in this environment out of older hardware for development and both the CPU affinity and Memory NUMA was showing.  It has to be a BIOS thing, I will be pulling 2 out of the affected hosts down this up coming week so I will be able to post an update then.

0 Kudos