Hi
Any actual differences when creating
3 vcpu
1) 1 cores each in 3 different sockets
2) 3 cores in 1 socket
3 disks
1) 1 disk each in 3 different controller
2) 1 controller with 3 disks
There must be a reason why such options are available
the ability to select between cores and sockets is for two main reasons:
- Licensing:
As virtualization, basically is abstracting physical hardware resources to software, you need to be able to emulate as much hardware resources as possible.
Some Operating systems, such as Windows Server, or RedHat used to be licensed by CPU sockets.
So if you had 2 vCPU (1 core x 2 sockets) or (2 cores x 1 socket) you would have consumed a different amount of licenses.
- NUMA:
The placement of vCPU resources directly affects performance in some systems.
(each CPU socket has direct access to some memory sockets) This is called NUMA.
So the ability to change cores per socket gives you the ability to play with Numa and gain performance (If you are using vSphere 6.5 or later, don't do it unless it's absolutely necessary)
As for the controllers, I'm not 100% sure, but:
Having 1 disk 1 controller each disk will have a dedicated controller, not having to wait for the controller queue length, usually this is not necessary, since controllers can handle a lot of disks (up to 15 per controller) without an issue in normal IO values.
The best layout depends on how does your disks consume IO
Let me know if I have replied your answers
the ability to select between cores and sockets is for two main reasons:
- Licensing:
As virtualization, basically is abstracting physical hardware resources to software, you need to be able to emulate as much hardware resources as possible.
Some Operating systems, such as Windows Server, or RedHat used to be licensed by CPU sockets.
So if you had 2 vCPU (1 core x 2 sockets) or (2 cores x 1 socket) you would have consumed a different amount of licenses.
- NUMA:
The placement of vCPU resources directly affects performance in some systems.
(each CPU socket has direct access to some memory sockets) This is called NUMA.
So the ability to change cores per socket gives you the ability to play with Numa and gain performance (If you are using vSphere 6.5 or later, don't do it unless it's absolutely necessary)
As for the controllers, I'm not 100% sure, but:
Having 1 disk 1 controller each disk will have a dedicated controller, not having to wait for the controller queue length, usually this is not necessary, since controllers can handle a lot of disks (up to 15 per controller) without an issue in normal IO values.
The best layout depends on how does your disks consume IO
Let me know if I have replied your answers
I second this. Most of the time it's because of licensing. There should also be some CPU scheduling which is handled differently.
As for the disk layouts. I guess you are also correct here. But I've never actually seen any configuration that required a dedicated storage controller for each disk.
Also, @alankoh , don't forget you can only have 4 SCSI Controllers per VM.
HEy @bryanvaneeden
Just as a FYI: yesterday I found an OVF that has 1 controller per disk, first time in my life.
NetApp Storage Grid.
1 disk 1 LSI Parallell controller
Also, using vHardware 10 and really old vTools, on a really new OVF, who know why?
3 disks
1) 1 disk each in 3 different controller
2) 1 controller with 3 disks
Option 1 performs slighly better - not so much that it makes sense to use several controllers for every VM - but enough to make a difference for heavy duty VMs running databases or similar loads.
Ulli
Cores per socket:
As already mentioned, one reason and the initial reason for this feature was licensing (introduced in 4.0, supported in 4.1). That isn't far from the only impact though. The CPU topology (independent from the NUMA topology) is providing information of which cores share a cache, i.e. all cores in a socket share a last level cache (LLC) which translates locality benefits for threads that could hit the same cache line.
In 5.0, the design decision was made to "couple" cpuid.corePerSocket with maxVcpusPerNode (edit: sorry, internal short form) numa.vcpu.maxPerVirtualNode, i.e. setting a CPU topology would also influence the NUMA topology (which in theory are separate concept's) as that was the common reality. Since then multi NUMA node per socket, multi LLC per NUMA node etc. are a lot more common and it turns out that misconfiguration of cores per socket was so common that it we split the "coupling" of those two options so that the CPU topology no longer influenced the vNUMA topology.
There is still a very good reason to configure it correctly though, OSs and workloads can rely on CPU / Cache topology information to make scheduling decisions and showing it the actual underlying physical constrains can be beneficial.
Best Practices are usually separated into two distinct versions:
1. The least likely chance for someone to make things worse
2. The best performing option for some (even many) workloads but at the cost of operational complexity and danger to mis-configure and making things worse than the default.
Not touching corespersocket is 1, adjusting it is 2.
You'd usually want all cores to fit into a socket if your vCPUs stay below the cores of the physical sockets, there is a lot more detail to the actual sizing and whether you want to enforce HT but most logic is very well represented in this fling: https://flings.vmware.com/virtual-machine-compute-optimizer
On a side note, some applications can actually perform better with 1 cores per socket, those are usually sleep wake heavy, communicating threads that _shouldn't_ have their threads separated onto different cores (as those IPIs incur an overhead) but will due to the "intra socket is cheap" assumption. Having just 1 core per socket will keep them on the same vCPU, eliminating the need for IPIs. That is more or less a corner case but not as rare as one would think, esp. for "latency sensitive" / synchronous IO "benchmarks".
Controller vs. Disk
Queues were already mentioned and rarely an issue for anything but the largest IO workloads. The other important part is that interrupt coalescing rate determination is on a per controller basis, i.e. for DBs you want to have synchronous IO (latency sensitive) issued and completed as fast as possible, if you also have a data disk on that controller the flood of async IO would widen the interrupt window for the serial IO. Again, mostly relevant in performance critical workloads.
The cores vs. sockets question seems to still be in debate. I've been reading conflicting information so I opened a support ticket with VMware yesterday and was told by support "the general rule of thumb is for 4 or fewer vCPU's go with sockets over cores (4 sockets, 1 core per socket), but anything over 4 vCPU's to go with cores over sockets (1 socket, 8 cores per socket)." This seemed a little too broad to me at the time and sounded odd.
So what you're saying is the best practice is to always go with cores over sockets (1 socket, 8 cores per socket) assuming that you don't surpass your physical core count per socket? This is also assuming you don't have a reason to go the opposite route because of a specific application.
Is that correct? If so, that seems to contradict what support literally just told me yesterday.
There might have been some context to that, it is not "rule of thumb" I'm aware of. Email me the SR# and I follow up on that (myusername at vmware dot com).
SR 21204194803
There was no context that I am aware of. It was described to me as a "general rule of thumb" and that the support agent confirmed with her lead.
So the VMware official best practice is to always go with cores over sockets (1 socket, 8 cores per socket) assuming that you don't surpass your physical core count per socket and that you don't have an application specific reason to do otherwise?
@vbondzio Any update here? I'm still trying to determine whether or not my VM's are optimized with regard to cores vs. sockets.
My last message here meant to imply:
"What you were told is not correct and I let the TSE know. I hope you don't fault this particular lvl 1 engineer who might have gotten the information from someone not much senior. This stuff is complicated and it can't be avoided that there is some incorrect, anecdotal information floating around, we'll address those on a case by case basis. That being said, I'm confident that the answer I have provided earlier is both comprehensive and accurate."