All flash vSAN performance expectations? - Page 3

BB9193 · ‎11-24-2020

We just deployed an all flash vSAN cluster comprised of 4 Dell R640 ready nodes. Each node is comprised of:

2 Intel Xeon Gold 6246 @ 3.30 GHz
382 GB RAM
1 Intel Optane P4800x for cache
4 NVMe PM1725B for capacity
1 disk group per node

The vSAN traffic is running over a 25 GB core. Dedup and compression is disabled, as is encryption. We're using 6.7 U3. All firmware and drivers up to date. Storage policy is R1 FTT1.

I've deployed HCIBench and am currently running test workloads with it. The datastore is empty except for the HCIBench VM's. The Easy Run workload of 4K/70% Read/100% Random produced the following results:

I/O per Second: 189042.27 IO/S
Throughput: 738.00 MB/s
Read Latency: 1.48 ms
Write Latency: 1.15 ms
95th Percentile Read Latency: 3.00 ms
95th Percentile Write Latency: 2.00 ms

What should I be shooting for with regard to HCIBench results to be able to verify all is well and I can begin moving my production workload into vSAN? I'm currently testing the other 3 Easy Run workloads and can post any of those results if needed.

kastlr · ‎10-25-2021

Hi,

even a "low end array" has specialized storage controllers with (more or less) DRAM acting as cache.
vSAN uses SSDs as "cache devices", so there's by design a big difference.
That's why it's required to choose a design which will cover your needs.

What's the SPBM you're using for those special used case?
And what kind of capacity SSDs do you use, NVMe, SAS or SATA?

Even if a vmdk is build out of multiple components (limited to 255 GB/component) this doesn't automatically mean that the data is striped.

So if you run a file copy job on your vmdk with the default SPBM setting (Stripe=1) your sequential reads usually will end on a single disk.
Windows explorer doesn't send multiple read IOs, therefor you won't see bottlenecks on the device side.

Instead it will look like this

large Read IO raised by VM
vSAN checks if data is on cache disk or not (Cache Hit or Miss)
vSAN has to split it into smaller chunk
vSAN will send those smaller Read IO requests to a/multiple disks (when Stripe is set to 1 typically to a single disk)
disk will process the Read IO and returns data
vSAN combines the smaller chunks
vSAN returns the data requested by the VM
VM sends the next large Read IO

Windows Explorer file copy read operations are like a sequential read with no OIOs, not the best workload for vSAN.

Backup operators doesn't care about latency, all they're asking for is high throughput.

The vSAN graphs aggregates data (as many other monitoring tools), this could end in a scenario which you described.
But that's normal, and as long as your users doesn't raise concerns about performance it's fine.

It's like on the German autobahn, the average speed is 120 km/h, but at 8am it could be less than 40 km/h while at 3am you might drive as fast as your car could run.

😉

Do you have an actual performance problem or are you only concerned about the numbers reported by the vSAN graphs?

Hope this helps a bit.
Greetings from Germany. (CEST)

Neverland5 · ‎12-13-2022

I've been following a issue like this with a customer for awhile, and I'm boiling it down to a few things:

Raid Stripe Width

Raid level

Drive throughput

Network bandwidth

My best guess is a stripe of 1 and raid 1, leaving one copy on a disk group and the other copy across a 10gbit link (to another server or the other side of a stretched cluster) You're only going to see 22Gbit/S at the MAX. That's something like 3GB/s, AKA 90,000 32Kbit IOPS.

If you'd ever even see the full 3GB of read IOPS since drives aren't usually as fast as the full throughput.

I'd increase raid stripe width.

Side note: I have heard that link teaming doesn't aggregate across the link either, you only get one of the two links, and you can't team vSAN it will only ever use one link.