Reply to Message

View discussion in a popup

Replying to:
vbondzio
VMware Employee
VMware Employee

The formatting didn't quite work out (pre tags around the output to conserve white spaces)  but it is still legible with a bit so squinting and counting ...

The sched-stats -t ncpus output is missing so I'm not sure what the underlying host topology is but assuming that you have at least 16 core sockets without SNC.

You have vCPU Hot-Plug enabled, that disables vNUMA, not an issue as the VM (probably) isn't larger than the physical NUMA node but it might become one if you increase the size further.

coresPerSocket is set to 8, that represents a wrong topology as the underlying topology seems to indicate at least a 16 cores socket. Not overly fragmented but also not ideal. Check out: https://flings.vmware.com/virtual-machine-compute-optimizer

What is the uptime of the host and how many other large VMs are on it? Any intermittent CPU contention for any of those VMs? IMO locality migrations are a tad high, sched-stats -t numa-migration is also missing but even with that it is hard to make an assessment from a single point in time. If you see intermittent contention on VMs on that host, you might want to try disabling action affinity: https://kb.vmware.com/s/article/2097369

So basically, from the incomplete data, it seems the VM isn't too badly configured and esp. the memory activity is pretty high so that means the guest is touching memory aggressively and probably needs it. Whether it really does need to do that is an application level question that you won't be able to answer with vSphere level metrics. Is whatever is consuming and touching the memory actually sqlserver and not some runaway anti virus? Are the queries optimized or are maybe just a few indexes missing? I'm assuming nothing simple given that you have a dedicated SQL person but that doesn't mean the workload can't be optimized. Whether that is cheaper than adding more resources to the VM is up to you.

TL;DR set coresPerSocket to 16, disable vCPU Hot-Add, give your SQL admin what he wants but maybe talk about asking for help optimizing the workload inside the VM

Post the sched-stats -t ncpus output if you want me to be sure about the topology. Maybe also rammap screenshots, main view and per process sorted by total descending.