VMware Cloud Community
7007VM7007
Enthusiast
Enthusiast

Can't create all flash VSAN with PCIe SSD

Hi All

I'm trying to create my first test VSAN in my lab at home. Its a single node VSAN. I know its not supported but this is just to get me started before I get some more servers.

My hardware is as follows:

Supermicro X10SL7-F

32GB RAM

Xeon CPU E3-1230 v3 @ 3.30GHz

One Samsung 950 Pro 256GB PCIe SSD NVMe

Two Samsung 840 Pro 128GB SATA SSDs

As a test I was able to create a VSAN datastore using one of the Samsung 840 128GB drives as the cache tier and the other Samsung 840 128GB was used for capacity. This worked great and I was able to place VMs on the vsanDatastore and see dedupe/compression in action!

I have since deleted the above configuration and am now trying to use my Samsung 950 Pro 256GB PCIe SSD NVMe for the cache tier and to then use the two Samsung 840 Pro 128GB SATA SSDs for capacity. When I enable VSAN on the cluster and complete the setup I get the following errors as soon as I try to access the vsanDatastore:

vsan error.jpg

The capacity of the vsanDatastore shows as 0.00 B and I can't place any VMs on this datastore.

Can someone assist with getting this to work? I have tried recreating the VSAN setup a few times but it hasn't helped. I know the PCIe SSD works as I can set it up as a single disk (non VSAN) datastore and place VMs on it with no problem.

How can I troubleshoot this? Thanks.

31 Replies
elerium
Hot Shot
Hot Shot

VSAN when setup correctly is very very fast, even on a hybrid configuration. My hybrid setup can do 900+ MB/s for read/write for file copy and all flash with even faster results. I'm using much more expensive HCL hardware though. In regards to hardware, the raid controller or HBA in use and driver are of critical importance (both for stability and performance). I have seen performance on my cluster plumet to speeds slower than a laptop from bad HBA drivers/firmware. VSAN recommendation is to disable read/write caches on any controllers (so BBU/caches are actually not desirable) so the best thing to use is an HBA and one preferably on VSAN HCL. I would check the HBA driver in use in addition to HBA firmware (try /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a) and align that as closely as possible with VSAN HCL. The next most important in performance would be the SSD cache layer. The Samsung NVME you're using should work fine for lab VSAN 6.2 (vSphere 6.0 Update 2) homelab on 6th Gen Intel NUC | virtuallyGhetto . The Samsung disks you're using capacity are also fine and commonly used for VSAN lab purposes. Based on the slow write speeds that you report for your capacity drives, I would really look closely at the RAID/HBA driver in use by ESXi and firmware for your RAID/HBA.

In regards to troubleshooting, start local (tests on a single VM). Make sure benchmarks like CrystalDiskMark or IOMeter look good on a single VM before evaluating speeds on something like storage vMotion (datastore migration) or copy speeds between VMs as these will introduce other variables that you must test for. For VSAN performance troubleshooting, you'll need to be able to monitor latency/congestion (either through VSAN observer http://blogs.vmware.com/vsphere/files/2014/08/Monitoring-with-VSAN-Observer-v1.2.pdf or 6.2 VSAN performance monitor). WIthout being able to see this, you'll be troubleshooting blind and will just be guessing without being to pinpoint which layer is causing slowdowns.

There are strange things that happen with Windows and file copy between VMs which ends up being network related (even when everything occurs on the same host). You will find endless posts about these on google for every version of windows OS and ESXi and it's quite common and the culprit can be almost anything under the sun (NIC, NIC settings, esxi NIC driver, offload settings, OS, patches etc etc). These are difficult to diagnose which is why i recommended using iperf to test intercommunication speed of your VMs. Without validating that your virtual NIC is not bottle necking you cannot rule out that some weird networking issue is not your storage bottleneck.

Best of luck!

0 Kudos
7007VM7007
Enthusiast
Enthusiast

Hi elerium

Thanks for the great reply!

I'm not for one second doubting the amazing speed you can get with a VSAN setup Smiley Happy

I did notice that my onboard LSI-2308 controller is not on the HCL. My IBM ServerRAID SAS M5015 (aka 9260-8i) is also not on the HCL. I'm guessing this could be part of my painful performance issues? With VSAN I tried with and without the RAID card and it didn't make a difference. I don't think I am on the latest firmware on the RAID card but I did use the latest driver. The LSI-2308 is running the latest firmware (v20). The IBM card is not on the latest firmware but I have removed that card from my server. So although my LSI-2308 is not on the HCL it has a queue depth that is deep enough for VSAN and is on the latest firmware. I'm running v19 of the driver but on the VMware website v20 is available. My LSI-2308 card is running in IT mode so there is no RAID/RAID 0/cache/battery/etc to get in the way of  VSAN which I thought would be ideal? So the only thing I can see to do here is to update the driver from v19 to v20? v19 has a date of 2016-04-09 in ESXi from the CLI.

I have run some benchmarks using ATTODiskBenchmark from within the VMs and the results I get from this were really good. Speeds you'd expect from a SATA SSD: 300-400MB/s write speeds and 400-500MB/s read speeds. So all good with this test! I have tested this on the VSAN datastore (when I still had it) and in a couple VMs running on a single SSD disk datastore. Performance is really good when doing this benchmark test.

What I find odd is that when I ran ESXi 5.5 on my very same server about 18mths ago I was able to copy a large ISO file between two VMs (each one was running on its own SSD datastore) and I was easily getting 380-400MB/s which makes the poor speeds I am experiencing even odder. I am running ESXi 6 now Update 2. When I watch the fily copy dialog box in Windows I don't get consistent speed, its very choppy. So it'll start off "high" (like 80MB/s) and then bounce around a bit until it falls off a cliff to 20MB/s (or lower!).

I've always read that you need a RAID card with cache/battery to get good write performance but can I ask one thing about this:

Does this apply to PCIe NVMe SSD drives? Do these types of drives need a cache/battery for optimal performance?

I'd like to try and get my current setup working before purchasing more hardware!

Thanks for all the tips and helpful info.

0 Kudos
srodenburg
Expert
Expert

7007700770077007  Read this:

Buy HCL-Listed SATA or SAS SSD's ?  Why consumer grade SATA SSD's make no sense.

Read it all the way down, incl. Duncan Epping's reply.  You might now understand why performance might not meet your expectations.

0 Kudos
7007VM7007
Enthusiast
Enthusiast

Thank you for this! I think me using consumer drives is the cause of my poor performance!

The one thing I didn't quite fully grasp is in your post is, can I use ENTERPRISE SATA drives in VSAN and still get good performance? Or do I have to sell a kidney and buy SAS only drives??

Ok, so I was thinking of the following drives then to help me with my poor performance:

Cache tier drive:

Intel DC P3700 Add-in Card PCIe (I have a PCIe 3.0 x4 slot available)

OR:

Intel DC P3700 Series 400GB 2.5" (and use a U2 to M2 adapter)

Capacity tier:


Samsung SM863 480GB Enterprise Class SATA SSD

I would be using the onboard LSI-2308 on my motherboard to connect the capacity tier drives. This controller has 8 SAS ports and, from what I understand is not on the HCL but has a 975 queue depth so would the combination of the above drives (SATA and PCIe) and my LSI-2308 controller give me good performance in VSAN?

I just want to make sure using SATA is going to be ok (even if they are enterprise drives) and if the my onboard LSI-2308 controller is up to the (VSAN) task. Also, I won't be using any kind of RAID card at all with write back cache/BBU and from what I have read this shouldn't be an issue as VSAN prefers just an HBA (no RAID).

This has really been a learning curve for me so thank you all!

I thought I could start off by purchasing one of the capacity Samsung SM863 drives and then using my current M2 PCIe NVMe SSD Samsung Pro 850 drive for the cache tier before going all in with the (expensive) Intel DC P3700 drive for the cache tier?

Thanks again!

0 Kudos
elerium
Hot Shot
Hot Shot

I've always read that you need a RAID card with cache/battery to get good write performance but can I ask one thing about this:

Does this apply to PCIe NVMe SSD drives? Do these types of drives need a cache/battery for optimal performance?

NVMe devices communicate directly on the PCIe bus of the motherboard, they don't need (and cannot use) RAID cards at all.

0 Kudos
elerium
Hot Shot
Hot Shot

I think i misread your question, many NVMe SSD drives do utilize some sort of DRAM cache for performance. Enterprise class NVMe SSDs will often employ capacitors for power prevention loss (so that the DRAM can flush all the data to cells) in the event of power loss.

0 Kudos
7007VM7007
Enthusiast
Enthusiast

Thanks for the explanation.

Am I correct in saying that I can use ENTERPRISE SATA drives (that are on the HCL) and be ok performance wise? I've been thinking about getting  the Samsung SM863 480GB Enterprise Class SATA SSDs for the capacity tier and possibly even the cache tier. (I may add an NVMe drive later on for the cache tier)

If the performance is still poor then I would look at getting a supported HBA (IBM M1015).

Am I on the right track?

0 Kudos
elerium
Hot Shot
Hot Shot

Yes, enterprise SATA SSDs on HCL will work well for capacity. It should be okay too for cache, although NVMe for cache is a better choice. The only NVMe drives on HCL are the Intel P3600 & P3700 series and they aren't cheap.

0 Kudos
Linjo
Leadership
Leadership

I think NVME is a waste of money if you have anything less that a high performance 10GbE network and non-blocking switches with low latency and to be able to send the traffic over the wire.

When new faster disks come out (3D Xpoint etc) vSAN will probably need 25 or 40GbE networking to be able to make use of it with the current architecture.

// Linjo

Best regards, Linjo Please follow me on twitter: @viewgeek If you find this information useful, please award points for "correct" or "helpful".
7007VM7007
Enthusiast
Enthusiast

Appreciate the comments!

I think I will order a single Samsung SM863 480GB Enterprise SATA SSD and use this in the capacity tier and then use my existing NVMe drive for the cache tier. Once I have VSAN setup with these two drives I will do another test to see if the Enterprise SATA drive makes a difference.

If the performance is still terrible then I will purchase a second Samsung SM863 480GB Enterprise SATA SSD and use this in the cache tier (instead of the NVMe drive) and test once again.

If the performance is *still* terrible then I will look at getting an HBA (IBM M1015) card and connecting the Enterprise SATA SSD drives to this card.

After doing the above I am out of ideas!

0 Kudos
7007VM7007
Enthusiast
Enthusiast

So I was about to purchase the Samsung SM863 480GB Enterprise Class SATA SSD but there is a VERY slight difference in the model number on the VMWare HCL website compared to the website where I want to purchase the drive from.

On the VMware HCL website:

Samsung SATA SSD SM863 Series

it lists the model number as: MZ7KM480HAHP-00005

But on the website I want to purchase this drive from it lists the model number as: MZ7KM480E

Does this matter for VSAN?

0 Kudos
7007VM7007
Enthusiast
Enthusiast

My Samsung SM863 480GB Enterprise SATA SSD drive arrived today. So far the results have been impressive!! I setup a new datastore with this drive only and write speeds have been really good. Depending on the test they have been 320MB/s to 420MB/s!

I briefly setup VSAN using this drive as the capacity tier and then using my consumer drive Samsung 850 Pro PCIe NVMe SSD drive as the cache and the results weren't that great...I was getting about 200MB/s so using a consumer drive for the cache tier isn't a good idea.

I'll be ordering another SM863 drive and using it as the cache tier device while using my existing SM863 as the capacity tier and then doing some more tests. Currently I have my SM863 connected to my onboard Intel SATA port (disk queue depth is 32) and its been ok (so far in my limited testing).

When I have two VMs running on the SM863 datastore I can copy a 5GB ISO between the two VMs in about 22 seconds or 230MB/s. HUGE improvement over the 20MB/s or so I was getting on the consumer drives. I've also found that the speeds on the enterprise drive are very consistent (ie: on the consumer drive the speeds went up and down like a yo yo).

More to follow after some more testing with VSAN! Thanks for all the help, the forum has helped me in a huge way to overcome a problem I have been battling with.