VMware Cloud Community
tamiraig
Contributor
Contributor
Jump to solution

performance issue MSSQL vm on ESXi6.7

I Have a cluster with 8 ESXi 67 servers (1.5TB physical RAM each) containing about 400 VMs

one particular vm is a MSSQL server with 80GB RAM, 16 CPU

~59 GB memory active

~28.61 GHz CPU USAGE

the DBA is constantly requesting to add more RAM to that machine as he is claiming that the machine performance are slow due to lack or memory (low PLE).

my questions are:

what are the disadvantages of giving large amount or virtual memory to a single vm machine?

is there a way to check if virtual memory is actually the bottleneck from the vm perspective?

thanks

T
Labels (2)
0 Kudos
1 Solution

Accepted Solutions
vbondzio
VMware Employee
VMware Employee
Jump to solution

  1. It doesn't, yet. vCPU hot-add disables vNUMA, meaning that as soon as your VM size is >= 25 vCPUs on those hosts, one one vNUMA node will be presented to the host despite being scheduled over two.
  2. coresPerSocket only defines the guest visible CPU and cache topology, it doesn't affect ESXi scheduling or vNUMA autosizing (since 6.5), right now the VM runs on a single socket yet you present the guest two. The OS / application might schedule preferentially on one "socket" because it doesn't know that all vCPUs are in the same one.

    In general, better locality is preferable to being wider distributed, the latter is only benefitial if the application is NUMA optimized to a high degree _and_ can benefit from the additional memory bandwidth.

    Before 6.5, setting cpuid.coresPerSocket also set maxVcpusPerNode (edit: sorry, internal short form) numa.vcpu.maxPerVirtualNode and the two resulting NUMA clients, those were then most likely scheduled on two different pNUMA nodes.
  3. There isn't really a disadvantage, just make sure that if you should cross the single pNUMA memory amount, that you manually size the vNUMA nodes since ESXi's autosize only goes after vCPUs and cores per pNUMA node.

    vMotion impact is mostly dictated by memory / CPU activity but this particular workload doesn't seem excessive, there were some fairly substantial issues with vMotion of large VM pre 6.5 but that is no longer applicable. There are of course still monster workloads that might be somewhat impacted during the trace / resume phase but your's isn't getting close.

    Even for those, the tracing impact was dramatically reduced in 7.0 and for the resume phase, 7.0 U1 included some major changes.

    Definitely read: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vmotion-7u1-... if you want to know more.

View solution in original post

8 Replies
vbondzio
VMware Employee
VMware Employee
Jump to solution

A VM with 75% active / touched memory and (assuming 2.4 GHz) ~75% CPU usage is busy and probably needs the resources.

The next question is whether it can use those resources optimally and if it can, whether it does.

Can you run the following on your host? Maybe censor the displayName and post on pastbin or here with pre tags so it is monospaced.

https://gist.githubusercontent.com/vbondzio/6bd933f99305e8fdaa0e1ce5b27e88df/raw/f87d45431655cad95fd...
https://gist.githubusercontent.com/vbondzio/877585c3a1e1e738a3d217c1a65b7b07/raw/20a2d80ae2193c63c5b...
https://github.com/vbondzio/sowasvonunsupported/blob/master/memstats_wrapper.sh

tamiraig
Contributor
Contributor
Jump to solution

Hi 

memstats_wrapper.sh
name b schedGrp parSchedGroup memSize min max consumed ballooned swapped touched zipped shared zero
--------------------------------------------------------------------------------------------------------------------------------------------------------
SRVDWH01 y 10276735 4 81920 0 -1 81920 0 0 53248 0 1 1

dwh1.png

 

 

 

 

options="ncpus numa-clients numa-migration numa-cnode numa-pnode numa-global"; for option in ${options}; do echo -e; sched-stats -t ${option}; done

groupName groupID clientID nodeID time timePct memory memoryPct anonMem anonMemPct avgEpochs memMigHere
vm.3208054 3793515 0 0 4746998 57 8251392 98 11092 23 9 0
vm.3208054 3793515 0 1 3547075 42 137216 1 35204 76 0 0
vm.3208760 3797851 0 0 4414603 53 4041708 99 8500 34 9 0
vm.3208760 3797851 0 1 3878928 46 21524 0 15812 65 0 0
vm.3209423 3801403 0 0 3953752 47 2060 0 4620 17 0 0
vm.3209423 3801403 0 1 4338667 52 4192244 99 21584 82 9 0
vm.2300056 923227 0 0 2374896 22 100 0 10360 26 0 0
vm.2300056 923227 0 1 8335839 77 10342300 99 29092 73 9 0
vm.2301206 928068 0 0 5276575 49 12577636 99 33880 69 9 0
vm.2301206 928068 0 1 5431219 50 5276 0 14596 30 0 0
vm.2315105 979641 0 0 3848355 36 8001480 95 26964 73 9 0
vm.2315105 979641 0 1 6816354 63 387128 4 9636 26 0 0
vm.2315403 981259 0 0 5345431 50 16773772 99 45900 72 9 0
vm.2315403 981259 0 1 5319142 49 3444 0 17568 27 0 0
vm.2315581 982460 0 0 2701173 25 32 0 12504 21 0 0
vm.2315581 982460 0 1 7963256 74 12447712 99 45712 78 9 0
vm.2318100 993758 0 0 5277502 49 8387284 99 24796 62 9 0
vm.2318100 993758 0 1 5380335 50 1324 0 15184 37 0 0
vm.2318264 994951 0 0 8331787 78 16777188 99 57076 77 9 0
vm.2318264 994951 0 1 2326071 21 28 0 16848 22 0 0
vm.2318531 996296 0 0 5298237 49 8385648 99 29044 72 9 0
vm.2318531 996296 0 1 5359258 50 2960 0 10832 27 0 0
vm.3412361 4423651 0 0 2694307 34 10479084 99 15176 27 9 0
vm.3412361 4423651 0 1 5035035 65 6676 0 39804 72 0 0
vm.3479158 4630802 0 0 3969692 52 25613972 93 95072 76 9 0
vm.3479158 4630802 0 1 3584893 47 1649004 6 28664 23 0 0
vm.4584449 8116983 0 0 2748586 59 20 0 4624 21 0 0
vm.4584449 8116983 0 1 1892751 40 4194284 99 16856 78 9 0
vm.4587141 8127278 0 0 1964167 42 4189832 99 7620 29 9 0
vm.4587141 8127278 0 1 2671486 57 4472 0 18440 70 0 0
vm.3562174 4893673 0 0 3396957 46 272 0 28716 45 0 0
vm.3562174 4893673 0 1 3920775 53 16776944 99 35088 54 9 0
vm.3593088 4998407 0 0 3658131 50 12582856 99 37660 60 9 0
vm.3593088 4998407 0 1 3585301 49 56 0 25044 39 0 0
vm.5708572 11771432 0 0 751099 50 2876 0 35016 64 0 0
vm.5708572 11771432 0 1 748549 49 12580036 99 19552 35 9 0
vm.4705035 8518228 0 0 2390111 55 8384136 99 13316 37 9 0
vm.4705035 8518228 0 1 1915563 44 4472 0 22288 62 0 0
vm.3748475 5465647 0 0 1239812 18 16390632 97 40584 54 9 0
vm.3748475 5465647 0 1 5588251 81 386584 2 33652 45 0 0
vm.5009673 9649009 0 0 1111685 33 4192044 99 11844 54 9 0
vm.5009673 9649009 0 1 2195746 66 2260 0 9732 45 0 0
vm.5010153 9651969 0 0 1763999 53 3125204 99 4340 20 9 0
vm.5010153 9651969 0 1 1543232 46 44 0 17092 79 0 0
vm.5010509 9654417 0 0 2918836 88 915224 5 32416 52 0 0
vm.5010509 9654417 0 1 388036 11 15861992 94 29508 47 9 0
vm.5010675 9655609 0 0 2993008 90 16777024 99 33944 55 9 0
vm.5010675 9655609 0 1 313904 9 192 0 27760 44 0 0
vm.5011284 9659489 0 0 2921405 88 1044260 99 3708 46 9 0
vm.5011284 9659489 0 1 385044 11 220 0 4352 53 0 0
vm.2993524 3144823 0 0 5505314 62 2086912 99 8904 78 9 0
vm.2993524 3144823 0 1 3308679 37 4096 0 2400 21 0 0
vm.3018713 3221070 0 0 4190952 47 33553988 99 71460 58 9 0
vm.3018713 3221070 0 1 4558972 52 444 0 51712 41 0 0
vm.5147836 10153441 0 0 2052943 71 2048 0 8780 17 0 0
vm.5147836 10153441 0 1 824366 28 8386560 99 40008 82 9 0
vm.5180273 10276735 0 0 1182407 42 82722816 98 114836 39 9 0
vm.5180273 10276735 0 1 1596072 57 1163264 1 175996 60 0 0
vm.3102940 3457244 0 0 5094730 59 4194132 99 10212 41 9 0
vm.3102940 3457244 0 1 3459864 40 172 0 14584 58 0 0
vm.3103771 3461540 0 0 7586891 88 16417996 97 18704 30 9 0
vm.3103771 3461540 0 1 966334 11 359220 2 42912 69 0 0
vm.3104058 3463292 0 0 4355146 50 2060 0 17372 18 0 0
vm.3104058 3463292 0 1 4198214 49 16775156 99 74744 81 9 0
vm.3105773 3470875 0 0 3887148 45 16772752 99 19672 30 9 0
vm.3105773 3470875 0 1 4663261 54 4464 0 44732 69 0 0
vm.3107599 3479059 0 0 4413990 51 12581640 99 18124 34 9 0
vm.3107599 3479059 0 1 4132789 48 1272 0 34772 65 0 0
vm.3125776 3540067 0 0 3253966 38 84 0 8848 22 0 0
vm.3125776 3540067 0 1 5246228 61 8388524 99 31040 77 9 0
vm.3128933 3554459 0 0 1915704 22 16774020 99 21064 32 9 0
vm.3128933 3554459 0 1 6572051 77 3196 0 43236 67 0 0
vm.3144732 3605747 0 0 4916409 58 4192020 99 12044 55 9 0
vm.3144732 3605747 0 1 3535840 41 2284 0 9720 44 0 0

nodeID used idle entitled owed loadAvgPct nVcpu freeMem totalMem
0 23289 24710 23699 0 32 140 426468928 803867160
1 5438 42561 2100 0 3 56 684835036 805306368

NUMA Global Stats
-------------------------
balanceMigration: 2132
loadMigration: 0
localityMigration: 529931
longTermFairnessMigration: 0
monitorMigration: 0
localMemory: 485204892
remoteMemory: 5088356

vmdumper -l | cut -d \/ -f 2-5 | while read path; do egrep -oi "DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*|numa:.*|numaHost:.*" "/$path/vmware.log"; echo -e; done

DICT numvcpus = "16"
DICT memSize = "81920"
DICT displayName = "SRVDWH01"
DICT vcpu.hotadd = "TRUE"
DICT numa.autosize.cookie = "160001"
DICT numa.autosize.vcpu.maxPerVirtualNode = "8"
DICT cpuid.coresPerSocket = "8"
numaHost: NUMA config: consolidation= 1 preferHT= 0
numa: Hot add is enabled and vNUMA hot add is disabled, forcing UMA.
numaHost: 16 VCPUs 1 VPDs 1 PPDs
numaHost: VCPU 0 VPD 0 PPD 0
numaHost: VCPU 1 VPD 0 PPD 0
numaHost: VCPU 2 VPD 0 PPD 0
numaHost: VCPU 3 VPD 0 PPD 0
numaHost: VCPU 4 VPD 0 PPD 0
numaHost: VCPU 5 VPD 0 PPD 0
numaHost: VCPU 6 VPD 0 PPD 0
numaHost: VCPU 7 VPD 0 PPD 0
numaHost: VCPU 8 VPD 0 PPD 0
numaHost: VCPU 9 VPD 0 PPD 0
numaHost: VCPU 10 VPD 0 PPD 0
numaHost: VCPU 11 VPD 0 PPD 0
numaHost: VCPU 12 VPD 0 PPD 0
numaHost: VCPU 13 VPD 0 PPD 0
numaHost: VCPU 14 VPD 0 PPD 0
numaHost: VCPU 15 VPD 0 PPD 0

 

T
0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

The formatting didn't quite work out (pre tags around the output to conserve white spaces)  but it is still legible with a bit so squinting and counting ...

The sched-stats -t ncpus output is missing so I'm not sure what the underlying host topology is but assuming that you have at least 16 core sockets without SNC.

You have vCPU Hot-Plug enabled, that disables vNUMA, not an issue as the VM (probably) isn't larger than the physical NUMA node but it might become one if you increase the size further.

coresPerSocket is set to 8, that represents a wrong topology as the underlying topology seems to indicate at least a 16 cores socket. Not overly fragmented but also not ideal. Check out: https://flings.vmware.com/virtual-machine-compute-optimizer

What is the uptime of the host and how many other large VMs are on it? Any intermittent CPU contention for any of those VMs? IMO locality migrations are a tad high, sched-stats -t numa-migration is also missing but even with that it is hard to make an assessment from a single point in time. If you see intermittent contention on VMs on that host, you might want to try disabling action affinity: https://kb.vmware.com/s/article/2097369

So basically, from the incomplete data, it seems the VM isn't too badly configured and esp. the memory activity is pretty high so that means the guest is touching memory aggressively and probably needs it. Whether it really does need to do that is an application level question that you won't be able to answer with vSphere level metrics. Is whatever is consuming and touching the memory actually sqlserver and not some runaway anti virus? Are the queries optimized or are maybe just a few indexes missing? I'm assuming nothing simple given that you have a dedicated SQL person but that doesn't mean the workload can't be optimized. Whether that is cheaper than adding more resources to the VM is up to you.

TL;DR set coresPerSocket to 16, disable vCPU Hot-Add, give your SQL admin what he wants but maybe talk about asking for help optimizing the workload inside the VM

Post the sched-stats -t ncpus output if you want me to be sure about the topology. Maybe also rammap screenshots, main view and per process sorted by total descending.

tamiraig
Contributor
Contributor
Jump to solution

 

 sched-stats -t ncpus
96 PCPUs
48 cores
2 packages
2 NUMA nodes

 

dwh3.PNG

dwh2.PNG

 

 

 

 

T
0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

Ok, so you don't "need" to remove vCPU hot-add until the VM hits >=25 vCPUs but you should adjust to have all cores (<=24) in a single vSocket. AWE is used by applications to "skip" Window's virtual memory management and acquire ranges of locked (large) pages. While that size in configurable and sql server is opportunistic by nature, the large percentage of touched memory (vSphere "active" metric, _not_ the same as active in rammap / Windows where it means "resident") seems to support the claim of your DBA.

So yeah, unless the application / DB itself is inefficient, the VM really does need those resources.

tamiraig
Contributor
Contributor
Jump to solution

thanks vbondzio

that is a lot of info, I must say I need to go over it to fully grasp.

can you please clarify, 

1. why vCPU hot-add disturb performance? the VM has 16 vCPUs.

2. also why put all cores on a single vsocket? shouldnt I spread the CPUs on both sockets 8+8 ?

the DBA is aware of the fact that queries are not optimized. its easier for him to request more memory.

3. my question in that matter is what is the disadvantage of giving more RAM? will the machine withstand vmotion events etc.?

 

T
0 Kudos
vbondzio
VMware Employee
VMware Employee
Jump to solution

  1. It doesn't, yet. vCPU hot-add disables vNUMA, meaning that as soon as your VM size is >= 25 vCPUs on those hosts, one one vNUMA node will be presented to the host despite being scheduled over two.
  2. coresPerSocket only defines the guest visible CPU and cache topology, it doesn't affect ESXi scheduling or vNUMA autosizing (since 6.5), right now the VM runs on a single socket yet you present the guest two. The OS / application might schedule preferentially on one "socket" because it doesn't know that all vCPUs are in the same one.

    In general, better locality is preferable to being wider distributed, the latter is only benefitial if the application is NUMA optimized to a high degree _and_ can benefit from the additional memory bandwidth.

    Before 6.5, setting cpuid.coresPerSocket also set maxVcpusPerNode (edit: sorry, internal short form) numa.vcpu.maxPerVirtualNode and the two resulting NUMA clients, those were then most likely scheduled on two different pNUMA nodes.
  3. There isn't really a disadvantage, just make sure that if you should cross the single pNUMA memory amount, that you manually size the vNUMA nodes since ESXi's autosize only goes after vCPUs and cores per pNUMA node.

    vMotion impact is mostly dictated by memory / CPU activity but this particular workload doesn't seem excessive, there were some fairly substantial issues with vMotion of large VM pre 6.5 but that is no longer applicable. There are of course still monster workloads that might be somewhat impacted during the trace / resume phase but your's isn't getting close.

    Even for those, the tracing impact was dramatically reduced in 7.0 and for the resume phase, 7.0 U1 included some major changes.

    Definitely read: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vmotion-7u1-... if you want to know more.
tamiraig
Contributor
Contributor
Jump to solution

definitely interesting stuff here that I need to read some more of.

thanks again for the professional assistance 

 

T
0 Kudos