I'm trying to find an answer on whether it is recommended to use the VMware Paravirtual SCSI Controller in a vSAN environment, or if the default LSI Logic SCSI controller should be used. I have a four node vSAN cluster (each node obviously has DAS storage for the vSAN datastore) in a DellEMC VxRail. The Guest OSes will all be either WS2012 R2 or WS2016 in vSphere 6.5. From what I've read:
But we technically don't fit into either category. Can anyone help with a definitive answer?
Thanks in advance!
For a large number of workloads, the default controller is sufficient. We do recommend changing to PVSCSI controller for better performance when we are working with workloads that generate large number of IOs (Oracle, SQL, etc.) In such cases, you may also need additional controllers to avoid bottlenecks.
Thanks for the information... I understand this, and is what I've read in the knowledgebase. But nowhere does it mention what to do with vSAN. Are there any vSAN-specific considerations or recommendations on which SAS driver to use for a Windows Server-based VM? Or just follow the general practices from the knowledgebase article that you mentioned?
I guess a more direct question would be....
Is there any real reason NOT to use the PVSCSI driver for all Windows Server VMs? I'd rather use all LSI Logic SAS, or ALL PVSCSI for all VMs simply for consistency's sake... Unless there is a reason not to.
I came across exact the same question and found no exact answer to that.
As GCCJay mentioned:
The PVSCSI Controller IS recommended for SAN environments
The PVSCSI Controller IS NOT recommended for DAS environments
So, what is vSAN? It seems to be a bunch of vSAN certified disks (optimally SAS or NVMe based), locally attached to an vSAN certified I/O controller inside a vSphere certified server, presenting a part of a distributed block-level object storage cluster. Correct me if I am wrong.
Assume a vSAN host with an I/O controller with a queue depth of 1024 and 4x SAS disks attached to it with a 256 queues wide SAS interface each. Those 4 disks would perfectly fit into the queue “bandwidth” of that I/O controller.
At the Virtual Machine level, there are 2 queues:
From my understanding the I/O chain would may looks like this:
Guest OS App (PVSCSI LUN) -> vSCSI (PVSCSI adapter) -> ESXi storage stack -> driver -> I/O controller -> disk
My first though was, that configure PVSCSI on all VMs could probably lead to an “oversubscription” of queues. Let’s assume 4 VMs with 4 vDisks and 1 PVSCSI adapter each, that would probably lead to a situation where you have nearly as much queues on the virtual layer as in the physical layer. If all these VMs would generate a high amount of I/O that could possibly lead to performance issue BUT… vSAN distributes the I/O over as much vSAN data nodes inside the cluster as needed and therefore it is unlikely that you can fill up these queues on one host so easily. Also, there is a caching tier for hot inflight I/O on each host and not every workload generates a static high amount of I/O.
Long story short, I don’t have an exact answer to the question but maybe we can discuss this? Or maybe there is a VMware guy around for some clarification
"So, what is vSAN? It seems to be a bunch of vSAN certified disks (optimally SAS or NVMe based), locally attached to an vSAN certified I/O controller inside a vSphere certified server, presenting a part of a distributed block-level object storage cluster."
Not "seems to be". That *is* what it is. Essentially, when using a protection-policy, vSAN could be described as "RAID over Network".
"vSAN distributes the I/O over as much vSAN data nodes inside the cluster as needed"
That depends. When we talk about a RAID1 equivalent protection-policy, after the VM wrote something, those writes are sent out twice (mirror copy 1 & mirror copy 2), one goes to ESXi host A where it lands on a certain disk from the local pool of disks there. The other mirror copy goes to another vSAN node to one of its own local disks.
When erasure coding policies (R5 or R6 equivalent) or when using striping, then more hardware components are starting to get used in parallel (but this also introduces latencies).
The whole concept of "thinking in queues" in the classical sense becomes fuzzy when one introduces a software-based RAID engine.
Two answer the original question: just use the same concepts as with non vSAN: either smack everything onto a pvscsi controller or only the VMDK's that are high-performance. The performance-enhancing technique of pvscsi (in certain cases!!) still applies, vSAN or not.
When you deploy a new Server 2019 VM on vSphere 7, the controller recommendation is the NVMe controller (for boot-drive and all other vmdk's), even though there is no actual physical NVMe hardware to be seen in the cluster. It's all abstracted anyway.
I would not "over-analyse" the whole thing, trying to match things in your head, because as soon as you start with virtual NVMe controllers and storage-devices (65k queuedepth) even though the actual hardware is SAS (256 queuedepth) or even SATA for capacity devices (QD=32) and trying to fit one in the other, you'll go grazy.
Don't forget you can and/or need to change the registry key in Windows to also allow a higher queue depth.
Like in the following link: https://kb.vmware.com/s/article/2053145
The IO flow you discussed seems to be more or less alright to me, you might be missing the physical Port Queue's, but I have never seen issues on that level before.I found the following blogpost helpful: http://www.yellow-bricks.com/2014/06/09/queue-depth-matters/
We also have a couple of vSAN environments across a multitude of vendor/hosts and most, if not all, VM's use the PVSCSI controller on VM's. It's been the recommended way for years now if you want utilize the maximum performance the environment can offer.
I found that, especially when using NVME drives/good controllers that the queue depth never get's reached, unless you are doing a crazy amount of IO on a single VM. I hope this brings you some useful information!
PS: Increasing the registry key in Windows has no use if the queue isn't the same size all the way down to the array though.
I've been trying to get an answer to this question for months now. We just deployed an all flash vSAN cluster and my thinking during the planning and design phase was that we would use the NVMe controller - seems logical with all flash. However when I started asking about the NVMe/PVSCSI adapters both Dell and VMware provided very vague answers and basically said "well you can use it if you want, but you probably won't see any benefit." I eventually posted the question on Spiceworks (which is almost identical to the question you asked) and the consensus there was that I could use PVSCSI but should do so at my own risk:
My plan now is to run PVSCSI on my SQL servers only.
"the consensus there was that I could use PVSCSI but should do so at my own risk"
Bullpoop. There is no risk whatsoever using PVSCSI. You might not see a performance benefit compared to other controllers but that's not "risky".
In the end, if you really want to go down the rabbit-hole, create a VM with several controllers, attach a VMDK to each one and benchmark the bejeezes out of them. Then YOU will know what controller etc. offers the best performance in YOUR environment.
And do a proper benchmark. Atto etc. are not proper benchmarks tools. Use something that isn't so lame to work inside RAM buffers and that can do a proper mix of IO Types and IO sizes that your workloads typically do.