Contributor
Contributor

All flash vSAN performance expectations?

We just deployed an all flash vSAN cluster comprised of 4 Dell R640 ready nodes.  Each node is comprised of:

2 Intel Xeon Gold 6246 @ 3.30 GHz
382 GB RAM
1 Intel Optane P4800x for cache
4 NVMe PM1725B for capacity
1 disk group per node

The vSAN traffic is running over a 25 GB core.  Dedup and compression is disabled, as is encryption.  We're using 6.7 U3.  All firmware and drivers up to date.  Storage policy is R1 FTT1.

I've deployed HCIBench and am currently running test workloads with it.  The datastore is empty except for the HCIBench VM's.  The Easy Run workload of 4K/70% Read/100% Random produced the following results:

I/O per Second: 189042.27 IO/S
Throughput: 738.00 MB/s
Read Latency: 1.48 ms
Write Latency: 1.15 ms
95th Percentile Read Latency: 3.00 ms
95th Percentile Write Latency: 2.00 ms

What should I be shooting for with regard to HCIBench results to be able to verify all is well and I can begin moving my production workload into vSAN?  I'm currently testing the other 3 Easy Run workloads and can post any of those results if needed.

0 Kudos
13 Replies
Expert
Expert

I don't have HCIbench numbers for you but my lab has 12G dual-ported SAS SSD's (highest vSAN HCL performance category "F") and high-end Enterprise SATA SSD's as capacity devices. Everything I do just flies. Super zippy. Cloning a 100gig VM -> BAM! done. Working with Servers and doing heavy stuff, it goes like a bat out of hell. Your flash hardware is even faster so you can only expect goodness. The flash-devices and the CPU's in my Lab are fast enough to consistently max out the 10gig links between nodes when I really hammer it.

And I use "compression only" in vSAN 7 U1 and the difference between "no compression" or "with compression" is measurable, but as a human, I don't feel the difference. My fat SQL queries are only fractionally slower with compression turned on, it's almost statistically irrelevant (error margin). With Deduplication+Compression active I noticed a loss in "snappiness" and responsiveness. But "compression only", almost nothing, you could fool me with a placebo.

Honestly, don't get a hard-on about benchmark numbers too much. If it goes like a rocket, it's fast. And vSAN all-flash with proper hardware like you have, goes like a rocket. Trust me.

What can ruin the party though is using crappy switches for vSAN traffic. I've seen people use fat servers connected to cheap-skate switches with small per-port buffers (which saturate quickly) and weak packet-forwarding performance in general and then all your super fast flash storage is slowed down by relatively slow inter-node traffic. Under stress, this aggravates quickly as the switches just can't cope.
It makes a difference for latency if the vSAN vmkernel ports of two nodes have a 0.6ms rtt or a 0.2ms rtt between the two nodes. Rule of thumb:  switches that "think too much" or are simply not very fast (cheap crap), tend to introduce a latency not-befitting the super duper NVMe flash-devices inside the nodes.

0 Kudos
Enthusiast
Enthusiast

From my experience, on hcibench you can expect around these results per node with 2DG per node (on 100% read 100% random 4k):

NVME Cache: 150-170Kiops

SAS Cache: 110-130Kiops

SATA Cache: 60-70Kiops

I consider only the cache because if you run the default test all will be placed on cache, and it's there that you find eventually the bottlenecks.

With 1DG per node just divide them by 2. On Optane I think that 100% read will be just around the NVME performance listed (Optane shines on writes and low latencies, on reads are not much better than NVME)

So for your configuration with 1DG per node I'll expect about 350-400K iops on 100% read 100% random 4K

What are your resutls?

0 Kudos
Contributor
Contributor

I posted the results for the 4K/70% Read/100% Random workload in my original post above.  The 256K/0% Read/0% Random workload however has me a little concerned:

Number of VMs: 8
I/O per Second: 11854.95 IO/S
Throughput: 2963.00 MB/s
Read Latency: 0.00 ms
Write Latency: 6.16 ms
95th Percentile Read Latency: 0.00 ms
95th Percentile Write Latency: 12.00 ms

Although the more research I'm doing maybe that's just due to the block size?

Here are the results for 4K/100% Read/100% Random:

Number of VMs: 8
I/O per Second: 330801.05 IO/S
Throughput: 1292.00 MB/s
Read Latency: 0.82 ms
Write Latency: 0.00 ms
95th Percentile Read Latency: 1.00 ms
95th Percentile Write Latency: 0.00 ms

0 Kudos
Contributor
Contributor

So I posted a response here a couple of days ago but its gone now, not sure what happened.  I'll post it again with additional info.

My 4K 100% Read 100% Random results are:

Number of VMs: 8
I/O per Second: 330801.05 IO/S
Throughput: 1292.00 MB/s
Read Latency: 0.82 ms
Write Latency: 0.00 ms
95th Percentile Read Latency: 1.00 ms
95th Percentile Write Latency: 0.00 ms

I'm good with these and its what I would expect.  The write side however is much lower than I was anticipating.  Here are my 4K 100% Write 100% Random results:

Number of VMs: 8
I/O per Second: 104066.28 IO/S
Throughput: 406.00 MB/s
Read Latency: 0.00 ms
Write Latency: 2.63 ms
95th Percentile Read Latency: 0.00 ms
95th Percentile Write Latency: 8.00 ms

VMware has initially said this is to be expected due to the redundancy of vSAN.  It doesn't get better than Optane for the cache tier, so I'm confused by this.  We've pushed back on VMware for further verification.

Do these write results look optimal?

0 Kudos
Enthusiast
Enthusiast

Have you tryed with more vms? I think at least 4 per host, so 16 in total, with 4 core each (you have a lot of cores), and 8vmdk each (this is the default number)

In the Proactive Test on network test, do you get 10Gbps?

One thing that you can do for test if network is the bottleneck, is this:

- Create a VM with 1 disk with FTT0
- Check the position of the vmdk (VM->monitor->VSAN disk placement
- Test vmotioning the VM with the VM with disk no the same host (best performance expected), and VM on every other host in order to test performance of network
- For testing also a CrystalDiskMark will be enough. On the sequential read, you must get the full performance on the same host and the performance maxed out by 25Gbps link on the others. Can you share that results? Both read and write

0 Kudos
Contributor
Contributor

The number and size of VM's I'm running is the recommendation by HCI Bench based on my configuration.  The Proactive test shows the full 10 GB, but I think this only tests the VM Network, not actual vSAN traffic.  Our vSAN traffic has 25 GB dedicated to it.

I created a new policy with FTT0, but I'm not following how to vMotion the actual disk as there is only one datastore.

0 Kudos
Expert
Expert

"but I'm not following how to vMotion the actual disk as there is only one datastore."

He is not talking about storage vMotion. He means "find out where the single disk-component" (with FTT=0, there is only one data-component as it's not mirrored) and do a normal vMotion of the VM to that host. That way, the physical disk is the same host as where the VM is running so the network is out of the equation.

0 Kudos
Contributor
Contributor

Gotcha.  I've vMotioned the VM to the same host where its hard disk resides.  The VM Home and VM Swap components are still on other hosts.

Now how am I supposed to test this, with CrystalDiskMark?

0 Kudos
Contributor
Contributor

CrystalDiskMark results.

VM on same host as disk:

[Read]
SEQ 1MiB (Q= 8, T= 1): 2220.505 MB/s [ 2117.6 IOPS] < 3775.22 us>
SEQ 128KiB (Q= 32, T= 1): 2430.383 MB/s [ 18542.4 IOPS] < 1724.83 us>
RND 4KiB (Q= 32, T=16): 472.854 MB/s [ 115442.9 IOPS] < 4430.71 us>
RND 4KiB (Q= 1, T= 1): 59.269 MB/s [ 14470.0 IOPS] < 68.94 us>

[Write]
SEQ 1MiB (Q= 8, T= 1): 2158.187 MB/s [ 2058.2 IOPS] < 3878.06 us>
SEQ 128KiB (Q= 32, T= 1): 1902.626 MB/s [ 14515.9 IOPS] < 2200.66 us>
RND 4KiB (Q= 32, T=16): 378.292 MB/s [ 92356.4 IOPS] < 5474.68 us>
RND 4KiB (Q= 1, T= 1): 11.953 MB/s [ 2918.2 IOPS] < 342.32 us>

VM on different host than disk:

[Read]
SEQ 1MiB (Q= 8, T= 1): 1934.667 MB/s [ 1845.0 IOPS] < 4332.42 us>
SEQ 128KiB (Q= 32, T= 1): 1849.289 MB/s [ 14109.0 IOPS] < 2266.81 us>
RND 4KiB (Q= 32, T=16): 418.366 MB/s [ 102140.1 IOPS] < 5007.07 us>
RND 4KiB (Q= 1, T= 1): 36.948 MB/s [ 9020.5 IOPS] < 110.67 us>

[Write]
SEQ 1MiB (Q= 8, T= 1): 2148.930 MB/s [ 2049.4 IOPS] < 3894.09 us>
SEQ 128KiB (Q= 32, T= 1): 2083.816 MB/s [ 15898.3 IOPS] < 2009.56 us>
RND 4KiB (Q= 32, T=16): 336.430 MB/s [ 82136.2 IOPS] < 6192.49 us>
RND 4KiB (Q= 1, T= 1): 10.387 MB/s [ 2535.9 IOPS] < 393.97 us>

 

0 Kudos
Enthusiast
Enthusiast

Not bad. For example, in a cluster where I'm working right now with Inel P4610 NVME as cache I get around 2200MB/s read and 1200MB/s write on same host, and 1900/1100 on other hosts.

Considering that 25Gbps can achieve max of 3GB/s and that some overhead is expected, I think that my results and your results are compatible with the installed hardware (you have much stronger writes) and that networking is not an issue/bottleneck (you have also only 1 DG for host)

As you can see, anyway, you are acheaving 100k iops 4k on random Write with a single host, so from a cluster perspective, I'm expecting 350K iops with FTT0 and obviusly a little less than 180K with FTT1 (add also the checksum penalty and you will get 150K iops I think).

With your test on HCI Bench you get only 100K (with FTT1?), so something is not working as expected.

Have you noticed congestion or network packet drop during the test? (You must run VSAN Observer to analyze well packet drops, the charts on VCenter are totally wrong and show no packet drops when it is happening)

0 Kudos
Enthusiast
Enthusiast

Ah, another information: what switches are you using?

0 Kudos
Contributor
Contributor

We're running a pair of Dell S5212F-ON switches dedicated for vSAN traffic.

0 Kudos
Enthusiast
Enthusiast

and what about congestion/packets drop on VSAN nics?

You must check them with VSAN Observer

http://www.vmwarearena.com/how-to-use-vsan-observer/

 

0 Kudos