I run a 4 node VSAN 6 Cluster with a HCL Listed LSI SAS Controller in Raid-0 Mode (has a queue-depth of 975) to which I currently have attached: 3x 10k RPM Enterprise SAS Drives and 1x Samsung 850 Pro SATA 512GB SSD. The nodes are identical.
These SSD's are there because of budget-constraints. I over-provisioned the SSD's from 512GB to 400GB and they appear to Controllers and BIOS as 400GB devices.
The performance is horrible. Overall, VM's feel sluggish.
When doing a simple file-copy of a DVD ISO File (say 3GB) from one VM to another, i get extremely high disk-latencies of 100ms to 300ms or even higher on both VM's. When a rebuild happens because i take a Node in Maint.mode with "full datamigration", or i re-apply a storage-policy from say "stripe-width 2" to "stripe-width 3", the whole cluster basically grinds to a halt. VM's become so slow and on Linux VM's I even get kernel-messages about scsi-timeouts etc. None of the VM's work, they freeze and basically die because "the disk is extremely unresponsive".
I'm not alone in this. I've read many similar stories on the net and it often boils down to the "consumer grade SATA SSD's" which these Samsungs are after all.
I assume I have the same issue.
My question is this: I have the option to replace the Samsung 850 Pro's with either a Sandisk SAS SSD which is on the HCL **or** use Intel DC S3700 SATA SSD's which are also on the HCL currently for VSAN 6.
What I do not understand is: "if the SATA SSD's i currently use, form a bottleneck because of the SATA Interface's queue-depth of 32, how can **any** SATA SSD be on the HCL at all?"
I doubt that these Intel DC S3700 are much faster that the Samsung 850 Pro's and they are both SATA drives and they both have a personal queuedepth of 32 and the entire SATA interface has a queuedepth of 32 pro port. So how can SATA drives work at all in a VSAN System. I just does not make any sense.
Everybody is beating us to death about "queue-depth queue-depth queue-depth". Cormac Hogan, Duncan Epping, everybody who is an authority on VSAN. Controllers with a too shallow queue-depth were thrown off the HCL etc. etc. etc.
When i look at the Intel S3700 SSD's the HCL says:
|Mandatory features present:||Drive Performance,Drive Reliability,Queue Depth,SATA SMART Attributes,Surprise Power Removal Protection,Trim / Unmap,Write Cache,Write Failure Notification|
It says "queue-depth" as one of the requirements that this SDD meets. But SATA, by definition, has a max. queue-depth of 32 which is way too small. So how can it meet "queue-depth" ? It's SATA !
The Intel DC S3700 would cost me about half the money of the Sandisk SAS SSD's and sure, if the Intel's will solve my performance-issues, then I'll get those. But if i'm in just as much trouble afterwards because of the SATA-Interface's queue-depth of 32, then i'd better invest in the SAS SSD (but that would deplete my budget for other things so i'd rather not).
Can anyone shed some light on this ?
(edit: i changed the title to better reflect my findings)