VMware Cloud Community
srodenburg
Expert
Expert
Jump to solution

Buy HCL-Listed SATA or SAS SSD's ? Why consumer grade SATA SSD's make no sense.

Hello,

I run a 4 node VSAN 6 Cluster with a HCL Listed LSI SAS Controller in Raid-0 Mode (has a queue-depth of 975) to which I currently have attached:  3x 10k RPM Enterprise SAS Drives and 1x Samsung 850 Pro SATA 512GB SSD. The nodes are identical.

These SSD's are there because of budget-constraints. I over-provisioned the SSD's from 512GB to 400GB and they appear to Controllers and BIOS as 400GB devices.

The performance is horrible. Overall, VM's feel sluggish.

When doing a simple file-copy of a DVD ISO File (say 3GB) from one VM to another, i get extremely high disk-latencies of 100ms to 300ms or even higher on both VM's. When a rebuild happens because i take a Node in Maint.mode with "full datamigration", or i re-apply a storage-policy from say "stripe-width 2" to "stripe-width 3", the whole cluster basically grinds to a halt. VM's become so slow and on Linux VM's I even get kernel-messages about scsi-timeouts etc. None of the VM's work, they freeze and basically die because "the disk is extremely unresponsive".

I'm not alone in this. I've read many similar stories on the net and it often boils down to the "consumer grade SATA SSD's" which these Samsungs are after all.

I assume I have the same issue.

My question is this:  I have the option to replace the Samsung 850 Pro's with either a Sandisk SAS SSD which is on the HCL **or**  use Intel DC S3700 SATA SSD's which are also on the HCL currently for VSAN 6.

What I do not understand is:  "if the SATA SSD's i currently use, form a bottleneck because of the SATA Interface's queue-depth of 32, how can **any** SATA SSD be on the HCL at all?"

I doubt that these Intel DC S3700 are much faster that the Samsung 850 Pro's and they are both SATA drives and they both have a personal queuedepth of 32 and the entire SATA interface has a queuedepth of 32 pro port. So how can SATA drives work at all in a VSAN System. I just does not make any sense.

Everybody is beating us to death about "queue-depth queue-depth queue-depth". Cormac Hogan, Duncan Epping, everybody who is an authority on VSAN. Controllers with a too shallow queue-depth were thrown off the HCL etc. etc. etc.

When i look at the Intel S3700 SSD's the HCL says:

Mandatory features present:Drive Performance,Drive Reliability,Queue Depth,SATA SMART Attributes,Surprise Power Removal Protection,Trim / Unmap,Write Cache,Write Failure Notification

It says "queue-depth" as one of the requirements that this SDD meets. But SATA, by definition, has a max. queue-depth of 32 which is way too small. So how can it meet "queue-depth" ? It's SATA !

The Intel DC S3700 would cost me about half the money of the Sandisk SAS SSD's and sure, if the Intel's will solve my performance-issues, then I'll get those. But if i'm in just as much trouble afterwards because of the SATA-Interface's queue-depth of 32, then i'd better invest in the SAS SSD  (but that would deplete my budget for other things so i'd rather not).

Can anyone shed some light on this ?

(edit:  i changed the title to better reflect my findings)

1 Solution

Accepted Solutions
srodenburg
Expert
Expert
Jump to solution

Update:  I bit the bullit, got a second morgage, told my kid could he can not go to college and bought the SAS SSD's (Sandisk 200GB 6G SAS). In each of the four nodes, I did a full-data migration, replaced the Samsung 850 Pro SATA SSD with the SAS SSD and holy macaroni, all of my problems are gone.

The entire system feels much more responsive and doing stuff like clone a VM (which would grind the entire cluster to a halt) just keeps on working like nothing is happening.

Before, when i did a simple file-copy of a DVD ISO from one VM to another, i'd get, maybe, 20 MB/s and the entire cluster started feeling sluggish until the copy-job was finished. Now, i get speeds of 80 to 90 MB/s and everything stays snappy and responsive.

What a difference !

I'm a very happy human.

By the way, my network is a 1Gbps network where each node has two NIC's in a LACP LAG and i clearly see a node talk to one other node over 1 NIC and talk to another node over the other NIC as LACP nicely splits the traffic over the two cards.

The system feels so fast now, and rebuilds / re-applies of storage-policies are hardly having an impact and, in general, happen much much faster than before.

And really, the only thing i did was swap the Samsung 850 Pro SATA SSD's with these SAS SSD's.

Lesson learned:  either fork out the cash for SAS SSD's (which are on the HCL of course) which is the best, or get VSAN certified Enterprise grade SATA SSD's but STAY AWAY from consumer grade SATA SSD's because they just don't have what it takes. They cannot empty their queues fast enough and clog everything up as soon as there is a bit of load of the system.

Just don't do it. Forget consumer grade SATA. It's like throwing money down the toilet. I've been there and I learned it the hard way. Let me be the stubborn donkey so you don't have to 😉

View solution in original post

14 Replies
zdickinson
Expert
Expert
Jump to solution

Good morning, I don't see the 512 GB Samsung SATA SSD drives on the HCL for v6 or v6 u1.

But in general, I believe the HCL to be frustrating.  Just because something is on the HCL does not mean that's it's certified for all use cases.  Now they have moved to vSAN ready nodes, which is OK... but doesn't really address use case.

My preferred HCL is the pdf:  https://partnerweb.vmware.com/programs/vsan/Virtual%20SAN%20Ready%20Nodes.pdf  It at least has a low, medium, high performance tiers.  Thank you, Zach.

Reply
0 Kudos
srodenburg
Expert
Expert
Jump to solution

Hello Zach,

"I don't see the 512 GB Samsung SATA SSD drives on the HCL for v6 or v6 u1."

I never said that. I only said they are in the current setup for budget reasons 😉

The Sandisk SAS SSD and the Intel DC S3700 SATA SSD are both on the HCL.

But reading through several other threads about performance-problems with VSAN Setups, you read the same basic message over and over again:  "Stay the hell away from SATA SSD's as they buckle under (medium) load very quickly".

So i'll take that message to heart and go for the SAS SSD because even through my own basic reasoning, i cannot, for the life of me, grasp why they allow SATA Drives in general on the HCL. When somebody builds a low-budget Lab-setup and uses SATA that's fine but if you want any decent performance, it's just a bad idea.

I see in ESXTOP when looking at the drives themselves. I do some basic file-copying between VM's and whoooom there goes the latency. The command-queue's on all four nodes for their respective SSD's are as long as the queue's for a Apple-store when a new iPhone is released...

I can also see that the commit going from write-cache (the SSD) to the SAS 10k HDD's is not a problem at all. Cannot saturate the buggers (i guess also because the SATA SSD cannot deliver the data fast enough to cause a sweat on the HDD's).

People totally focus on "how many megabytes per second throughput an SSD has" and "how many IOPS" which is all fine and dandy but if a SSD's data has to go through a key-hole (where with SAS, the door is simply open), then the SATA Interface is exactly that, a key-hole.

But maybe i'm totally missing something (it wouldn't be the first time). Maybe Duncan or Cormac can chime in and share their views? Maybe a Blog on this subject could inform others not to make the same mistake many of us, incl. myself, are making.

Reply
0 Kudos
srodenburg
Expert
Expert
Jump to solution

Zach wrote: "My preferred HCL is the pdf:  https://partnerweb.vmware.com/programs/vsan/Virtual%20SAN%20Ready%20Nodes.pdfhttps://partnerweb.vmware.com/programs/vsan/Virtual%20SAN%20Ready%20Nodes.pdf"

I went through that document. It strikes me that Fujitsu and Supermicro **ALWAYS** use SATA SSD's (often the Intel S3700), regardles of which setup (low, medium, high, vdi).  HP always uses SAS SSD's. DELL only on the "Low" Spec. Cisco uses SAS as well.

That makes you wonder: How can the likes of Fujitsu and SuperMicro get away with using the Intel DC S3700 SATA SSD's in all their setups? They never use SAS SSD's.

Sigh, the information I find keeps being very conflicting.

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

What network cards are you using? if possible update firmware and use the latest corresponding driver/vib from regular VMWare HCL for your NICs.

This has been an issue for some Intel NICs and high storage latency with VSAN.

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

I can't comment on the performance for the Samsung 850 Pro, but not having power loss protection support on the drive can result in data loss if your hosts suddenly lose power. This alone makes it too risky for use with production workloads and the reason the 850 Pro likely isn't on the HCL.

Reply
0 Kudos
srodenburg
Expert
Expert
Jump to solution

I did some more Snooping around. If one uses VMware's VSAN Sizing tool on the web, and builds him/herself a hefty mid-range VSAN Solution, in the "results" screen, you end up with the advice to take SATA SSD's. The Intel S3700 to be precise.

The only reason i can think of why these S3700 SSD's can work at all, is that they "can empty their queues, although small at max. 32, much quicker than consumer-grade SSD's can".

This only explains it partially to me because if we forget about "enterprise features" like protection against sudden power loss and endurance, an SSD like the Samsung PRO 850 Series is an utter speed-king in every test where it participates. I over-provisioned them from 512GB to 400GB so there are plenty of spare-cells to let the garbage-collector and wear-leveling logic do their thing. I'd still feel that they should work. Or at least that the S3700 should not be much better in this aspect (again, forgetting about the other "enterprise features" which are irrelevant to the latency-issue in my opinion).

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

Sure the SAS interface is by nature capable of a deeper queue depth, on top of that it has a higher bandwidth typically etc. However, as you also state most "enterprise" devices are so extremely fast that the "shallow" queue depth of 32 is usually not the problem for these components. It drains the IOs so fast that it is very uncommon for that to be the bottleneck.  When it comes to "enterprise readiness" what is more important is endurance, consistent performance and indeed protection against sudden power loss. Most consumer grade drives would simply lose data, with an S3700 (although using SATA interface) this does not happen. The "enterprise readiness" is not about the interface, but about the reliability / endurance, performance and predictability.

Reply
0 Kudos
srodenburg
Expert
Expert
Jump to solution

Update:  I bit the bullit, got a second morgage, told my kid could he can not go to college and bought the SAS SSD's (Sandisk 200GB 6G SAS). In each of the four nodes, I did a full-data migration, replaced the Samsung 850 Pro SATA SSD with the SAS SSD and holy macaroni, all of my problems are gone.

The entire system feels much more responsive and doing stuff like clone a VM (which would grind the entire cluster to a halt) just keeps on working like nothing is happening.

Before, when i did a simple file-copy of a DVD ISO from one VM to another, i'd get, maybe, 20 MB/s and the entire cluster started feeling sluggish until the copy-job was finished. Now, i get speeds of 80 to 90 MB/s and everything stays snappy and responsive.

What a difference !

I'm a very happy human.

By the way, my network is a 1Gbps network where each node has two NIC's in a LACP LAG and i clearly see a node talk to one other node over 1 NIC and talk to another node over the other NIC as LACP nicely splits the traffic over the two cards.

The system feels so fast now, and rebuilds / re-applies of storage-policies are hardly having an impact and, in general, happen much much faster than before.

And really, the only thing i did was swap the Samsung 850 Pro SATA SSD's with these SAS SSD's.

Lesson learned:  either fork out the cash for SAS SSD's (which are on the HCL of course) which is the best, or get VSAN certified Enterprise grade SATA SSD's but STAY AWAY from consumer grade SATA SSD's because they just don't have what it takes. They cannot empty their queues fast enough and clog everything up as soon as there is a bit of load of the system.

Just don't do it. Forget consumer grade SATA. It's like throwing money down the toilet. I've been there and I learned it the hard way. Let me be the stubborn donkey so you don't have to 😉

JohnNicholsonVM
Enthusiast
Enthusiast
Jump to solution

To add what Duncan is saying, Deeper queue depths for a single SSD add more latency (Multiply seek time * the QD you are operating at).  32 is generally good enough as going beyond that your latency is going to be creeping up. (typical non-NVMe drives are 20-100(microseconds).  This is why NVMe drives have 63K queue's so you can get great response times while running 300K IOPs.  This is also why data locality is not really an important factor (and why VSAN trades this for consistent performance).

https://www.vmware.com/files/pdf/products/vsan/VMware-Virtual-SAN-Data-Locality.pdf

Now on the subject of why you shouldn't use Cheap consumer SATA drives I've written on this a bit (although I do need to update it).

SynchroNet | How to pick the right VMware VSAN flash device

There are four major reasons why you shouldn’t use cheap consumer grade flash devices that are not on the HCL.

Data Loss

Power Loss Protection is a big deal. In our old VDI lab we previously used inexpensive consumer grade Samsung 840 drives for testing out new version of Horizon View. Our old lab faculty also regularly lost power. It did not take long for us to start having issues and upon investigation we discovered that power loss protection was the root cause. When a flash device accepts a write, it is first put into a volatile DRAM buffer (similar to the cache on a raid controller) where it is blended with other writes and gradually destaged to flash pages.  This is performed to reduce write amplification and improve random write performance. Enterprise grade drives (Such as a Micron M500DC, Intel S3700 etc) have large capacitors  to protect this volatile DRAM buffer from server power loss. Some mid range business drives (like the Micron M500 series) have limited power loss protection that protects the lower pages, but not this DRAM buffer. On power loss, any block that had been ACK’d to the host operating system, but had not been saved will be loss if full end to end power loss protection is not a feature of the drive.  In the case of our lab, this resulted in silent VMFS corruption that was unrepeatable, and the solution was wipe the drive and start over. Full power loss protection is a requirement on the VSAN HCL for both caching and capacity flash drives, and this requirement is to protect the integrity of your data. While VSAN mirrors data, in our experience power loss events in data centers often take down every single server in a cabinet at the same time.  Our lab is not alone in noticing this phenomenon.

Endurance

The quality of flash, combined with, internal over provisioning, controller tricks, and if it is used in a SLC/MLC/TLC configuration, nm size all factor into the endurance of a drive. Low end consumer drives may only be able to survive ~100TB of data written, while high end devices can survive tens of PB’s of writes before they are expected to fail. A Flash device used for write cache is going to take an lot of punishment, and so when deciding on which drives to use for the cache in VSAN VMware has taken a conservative position of making sure the drive can be fully written to at least 10 times per day over the course of 5 years, and 2.5PB of 4KB writes or 3.5PB of 8KB writes worth of endurance. This requirement is a reason that a lot of cheap drives with less than one dollar per GB are not on the HCL for caching purposes. Given the semi-even rate of writes to all drives in the cluster using drives that are not on the HCL for endurance reasons risks having to keep a box of spares, and another trash can in the data center handy to quickly throw the burnt out flash devices into.

Support

There are major considerations to support to vendor support of flash devices. Making sure you have supported drive by VMware means that the drives have been tested and that acceptable safe performance is an expected behavior. For PCIe, DIMM, NVMe drives additional driver complexity means you want to make sure your server vendor, VMware, and your flash vendor all support you using these newer, faster interfaces. Certified devices mean that issues can be escalated between vendors, and quicker driver and firmware updates should be expected. If drives are provided through your OEM replacements are only hours away in case of fault. Inversely consumer flash drives often have slow or non-existant fixes to reliability or performance problems. One popular consumer flash drive is currently six months into a nasty performance bug that produces seconds of disk latency in some cases.

IO consistency

Consumer grade flash drives are often a bit like drag racing cars. They are both designed to go really fast for short periods of time.  What gets glossed over n a lot of reviews is that they both have the same faults. Even if that drive can hit 100K IOPS on a good day, there are also times where if you try to go too fast too long, they suddenly may jump to hundreds of milliseconds to respond. They may also require downtime between short bursts of IO activity to garbage collect, empty buffers, and regain their performance. In an enterprise VSAN environment consistency of performance for the cache is king to good outcomes. Good drives can handle the steady state abuse that a large virtual desktop or server environment will bring. Cheap drives may deliver 5000 IOPS, or 50 IOPS  and latency spike sharply as they fluctuate wildly under load. Sizing, and good end user outcomes are impossible when flash drives will jump from being fast, to slower than a traditional disk.

JohnNicholsonVM
Enthusiast
Enthusiast
Jump to solution

I will point out that Intel's S3700's can deliver a LOT of IOPS, but the one word of caution is mixing them with SAS drives on a SAS expander within the same PHY group.  SATA and SAS mixed on expander's tends to do weird things (and I think Dell's server configuration tool will refuse this configuration). If you have an entire disk group on a dedicated HBA, or directly connected to a controller this shouldn't be as big of an issue.

Reply
0 Kudos
srodenburg
Expert
Expert
Jump to solution

Thanks for your additions John. Highly appreciated.

Reply
0 Kudos
danpritts3
Contributor
Contributor
Jump to solution

Thanks very much for this thread.  It is very informative.

For anyone else who ends up at this page while they're trying to find individual certified devices on the HCL, there's a link below the "choose a certified ready node" bit that shows you individual certified components.

Reply
0 Kudos
JohnNicholsonVM
Enthusiast
Enthusiast
Jump to solution

Scroll down on the page VMware Compatibility Guide: vsan

Find "Build your own from certified components"

Hereis the current 6.0U1 certified SSD list.

Reply
0 Kudos
danpritts3
Contributor
Contributor
Jump to solution

Thanks John - found it and changed my answer, but not before you replied. 

Reply
0 Kudos