iNik26
Enthusiast
Enthusiast

vSAN Lab

Hello,

I'm configuring a new home lab based on 3x Supermicro Superserver E300-9D-4CN8TP with the following config:

Intel Xeon D-2123IT

2x SFP+ 10GbE LAN

128 GB RAM

1x SATADOM 64GB (boot)

1x M.2 Samsung 970 Pro NVMe 500GB (cache)

1x M.2 WD 2 TB SATA SSD WDS200T2B0B (capacity)

vCenter Server Appliance 6.7.0.21000

ESXi -> VMware ESXi, 6.7.0, 11675023

I've configured vSAN (no manual adjustments or modifications, just turned on vSAN and claimed disks) and made some perf tests with HCI bench 1.6.8.7 (easy run) and I get results like these:

1-vdb-8vmdk-100ws-4k-70rdpct-100randompct-4threads-1551300003-res.txt

Datastore: Compute-vsanDatastore

=============================

Version: vdbench50407

Run Def: RD=run1; I/O rate: Uncontrolled MAX; elapsed=3600 warmup=1800; For loops: None

VMs = 6

IOPS = 59242.30 IO/s

THROUGHPUT = 231.41 MB/s

LATENCY = 3.2033 ms

R_LATENCY = 3.6867 ms

W_LATENCY = 2.0755 ms

95%tile_LAT = 10.1144 ms

=============================

Resource Usage:

CPU USAGE = 93.05%

RAM USAGE = 20.19%

VSAN PCPU USAGE = 45.3539%

=============================

 

2-vdb-8vmdk-100ws-4k-100rdpct-100randompct-4threads-1551309960-res.txt

Datastore: Compute-vsanDatastore

=============================

Version: vdbench50407

Run Def: RD=run1; I/O rate: Uncontrolled MAX; elapsed=3600 warmup=1800; For loops: None

VMs = 6

IOPS = 73090.60 IO/s

THROUGHPUT = 285.51 MB/s

LATENCY = 2.6077 ms

R_LATENCY = 2.6077 ms

W_LATENCY = 0.0000 ms

95%tile_LAT = 6.0918 ms

=============================

Resource Usage:

CPU USAGE = 97.47%

RAM USAGE = 20.87%

VSAN PCPU USAGE = 44.7564%

=============================

3-vdb-8vmdk-100ws-256k-0rdpct-0randompct-1threads-1551319607-res.txt

Datastore: Compute-vsanDatastore

=============================

Version: vdbench50407

Run Def: RD=run1; I/O rate: Uncontrolled MAX; elapsed=3600 warmup=1800; For loops: None

VMs = 6

IOPS = 7800.10 IO/s

THROUGHPUT = 1950.00 MB/s

LATENCY = 6.6960 ms

R_LATENCY = 0.0000 ms

W_LATENCY = 6.6960 ms

95%tile_LAT = 23.3727 ms

=============================

Resource Usage:

CPU USAGE = 82.94%

RAM USAGE = 22.2%

VSAN PCPU USAGE = 40.9864%

=============================

I was thinking if those results are "good" or not.. I see that Samsung 970 pro should be capable of:

  • 500.000 IOPS RANDOM WRITE (4KB, QD32),
  • RANDOM READ (4KB, QD32)

    512 GB: Up to 370,000 IOPS
    1,024 GB: Up to 500,000 IOPS
  • RANDOM READ (4KB, QD1)

    Up to 15,000 IOPS
  • RANDOM WRITE (4KB, QD1)

          Up to 55,000 IOPS

( source: Samsung SSD 970 PRO | Samsung V-NAND Consumer SSD | Samsung Semiconductor Global Website​ )

Also, looking on some blogs (for example https://www.virtualizationhowto.com/2017/01/samsung-960-evo-m-2-1tb-nvme-in-vmware-home-lab/  )   seems that even with the 960 much higher results can be achieved.. ( >200000 IOPS)

I've started to check basic conf (like bios, firmware etc ) and all seems correctly configured as per vendor best practices. Also I've tested the network connections and looks performing quite well ( 5]   0.00-10.11  sec  11.5 GBytes  9.75 Gbits/sec receiver).

Any ideas for those results?

Thanks you all for any input!

Message was edited by: iNik26

0 Kudos
6 Replies
iNik26
Enthusiast
Enthusiast

Hello,

I've tested the NVMe drives installing windows server and using tools like ATTO disk benchmark, crystal disk mark and Samsung Magician .. the results are quite different:

Screenshot 2019-03-03 at 22.35.33.png Screenshot 2019-03-03 at 22.33.41.png

Screenshot 2019-03-03 at 22.50.02.png

Screenshot 2019-03-03 at 23.01.16.pngScreenshot 2019-03-03 at 23.16.46.png

So, is it possible that something does not work correctly on esxi/vsan ?

Thank you again for any input.

0 Kudos
sk84
Expert
Expert

You have a hybrid setup. This means you have a fast cache device and a slower capacity device. And at some point the data accepted from the fast SSD or NVMe drive will have to be destaged to the slower SATA drive. And of course this goes slower because the write rate from the SATA disk is lower. So the specs from the NVMe disk are irrelevant, because it has to wait for the slower disk.

In practice, it is therefore recommended to have multiple disk groups and multiple capacity disks in a hybrid setup. Because then the read/write load is distributed across different capacity disks. In addition, vSAN also creates a certain management overhead, among other things through checksum calculation and other mechanisms. A single drive in a vSAN setup will therefore never be able to show the theoretical performance like in a laboratory test. But due to the scaling possibilities, a vSAN cluster can offer more performance than a single disk.

And I also see that the CPUs in your server are very busy. Maybe this is also a bottleneck.

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
0 Kudos
iNik26
Enthusiast
Enthusiast

Hello Sebastian,

thank you for your answer. If I understand correctly the bottleneck can be the M.2 SSD WDS200T2B0B. Right?

0 Kudos
sk84
Expert
Expert

Ah, okay. I didn't see that the capacity tier is an SSD device. I only read SATA. So it's not a hybrid setup, but an All-Flash configuration. But even this does not make the test more meaningful.

Because you write:

I've started to check basic conf (like bios, firmware etc ) and all seems correctly configured as per vendor best practices.

But both the Samsung EVO disk and the WD disk don't seem to be on the VMware vSAN HCL. At least I can't find them there. And you don't give any information about the Raid Controller. So it is interesting that vSAN works with this configuration at all, but nobody can judge the performance of your setup exactly, because it is not supported, tested or optimized by VMware.

Basically I can only say that I have seen better IOPS and latency values for All-Flash configurations with a similar setup. But not exactly with this hardware as it is not supported by VMware.

--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
0 Kudos
iNik26
Enthusiast
Enthusiast

Yes, unfortunately i know they aren't on HCL.

The system is a Supermicro E300-9D-4CN8TP (Supermicro | Products | SuperServers | Mini 1U | E300-9D-4CN8TP )

The "bigger" brother (E300-9D-8CN8TP) with 8 core CPU version is on HCL..

On some blogs I've seen that Samsung NVMe disks are used with good results. For example, with a Samsung 960 pro:

https://www.virtualizationhowto.com/2017/01/samsung-960-evo-m-2-1tb-nvme-in-vmware-home-lab/

So I'm wondering if I can get some better results/performances from my lab.

thanks, kind regards

0 Kudos
sk84
Expert
Expert

On some blogs I've seen that Samsung NVMe disks are used with good results. For example, with a Samsung 960 pro:

Yeah. But these tests aren't meaningful, too. And they are certainly not comparable. There are so many parameters that play a role in such tests and you can only compare tests that take place under exactly the same conditions and with the same parameters. And you haven't even used the same test framework yet. And with the blog you linked I can't even find out if he uses vSAN at all or tested it on a local datastore.

As I mentioned before, you can't compare single disk performance under lab conditions with a storage virtualization solution.

To give a stupid example what I mean:

You roll a tire down a very steep hill and it reaches a speed of 130 mph at the end. Now you build the tire on a car and wonder why you can't drive faster than 80 mph. Exactly the same principle applies to your setup. Firstly, the measured speed of the individual tyre depends on your test scenario (how steep is the hill, how long is it, etc.) and many other components besides the tyre play a role when built into a car. Therefore you can never compare these two tests.

So, to get back to your original question about the "worse" performance in your setup:

  • Are there vSAN setups that offer more performance? - Yes.
  • Are there vSAN setups that have a worse performance? - Yes.
  • Is the performance normal for your test constellation? - I don't know because I don't have any comparable hardware.
  • Can the performance be optimized in your test setup or are there factors that slow it down? - Probably. But this is a lot of trial & error in an unsupported setup and nobody knows if you can get 10% more performance at the end or 200%.
--- Regards, Sebastian VCP6.5-DCV // VCP7-CMA // vSAN 2017 Specialist Please mark this answer as 'helpful' or 'correct' if you think your question has been answered correctly.
0 Kudos