VMware Cloud Community
BB9193
Enthusiast
Enthusiast

All flash vSAN performance expectations?

We just deployed an all flash vSAN cluster comprised of 4 Dell R640 ready nodes.  Each node is comprised of:

2 Intel Xeon Gold 6246 @ 3.30 GHz
382 GB RAM
1 Intel Optane P4800x for cache
4 NVMe PM1725B for capacity
1 disk group per node

The vSAN traffic is running over a 25 GB core.  Dedup and compression is disabled, as is encryption.  We're using 6.7 U3.  All firmware and drivers up to date.  Storage policy is R1 FTT1.

I've deployed HCIBench and am currently running test workloads with it.  The datastore is empty except for the HCIBench VM's.  The Easy Run workload of 4K/70% Read/100% Random produced the following results:

I/O per Second: 189042.27 IO/S
Throughput: 738.00 MB/s
Read Latency: 1.48 ms
Write Latency: 1.15 ms
95th Percentile Read Latency: 3.00 ms
95th Percentile Write Latency: 2.00 ms

What should I be shooting for with regard to HCIBench results to be able to verify all is well and I can begin moving my production workload into vSAN?  I'm currently testing the other 3 Easy Run workloads and can post any of those results if needed.

Reply
0 Kudos
41 Replies
kastlr
Expert
Expert

Hi,

even a "low end array" has specialized storage controllers with (more or less) DRAM acting as cache.
vSAN uses SSDs as "cache devices", so there's by design a big difference.
That's why it's required to choose a design which will cover your needs.

What's the SPBM you're using for those special used case?
And what kind of capacity SSDs do you use, NVMe, SAS or SATA?

Even if a vmdk is build out of multiple components (limited to 255 GB/component) this doesn't automatically mean that the data is striped.

So if you run a file copy job on your vmdk with the default SPBM setting (Stripe=1) your sequential reads usually will end on a single disk.
Windows explorer doesn't send multiple read IOs, therefor you won't see bottlenecks on the device side.

Instead it will look like this

  • large Read IO raised by VM
  • vSAN checks if data is on cache disk or not (Cache Hit or Miss)
  • vSAN has to split it into smaller chunk
  • vSAN will send those smaller Read IO requests to a/multiple disks (when Stripe is set to 1 typically to a single disk)
  • disk will process the Read IO and returns data
  • vSAN combines the smaller chunks
  • vSAN returns the data requested by the VM
  • VM sends the next large Read IO

Windows Explorer file copy read operations are like a sequential read with no OIOs, not the best workload for vSAN.

Backup operators doesn't care about latency, all they're asking for is high throughput.

The vSAN graphs aggregates data (as many other monitoring tools), this could end in a scenario which you described.
But that's normal, and as long as your users doesn't raise concerns about performance it's fine.

It's like on the German autobahn, the average speed is 120 km/h, but at 8am it could be less than 40 km/h while at 3am you might drive as fast as your car could run.

😉

Do you have an actual performance problem or are you only concerned about the numbers reported by the vSAN graphs?


Hope this helps a bit.
Greetings from Germany. (CEST)
Reply
0 Kudos
Neverland5
Contributor
Contributor

I've been following a issue like this with a customer for awhile, and I'm boiling it down to a few things:

Raid Stripe Width

Raid level

Drive throughput

Network bandwidth

 

My best guess is a stripe of 1 and raid 1, leaving one copy on a disk group and the other copy across a 10gbit link (to another server or the other side of a stretched cluster) You're only going to see 22Gbit/S at the MAX. That's something like 3GB/s, AKA 90,000 32Kbit IOPS.

If you'd ever even see the full 3GB of read IOPS since drives aren't usually as fast as the full throughput.

 

I'd increase raid stripe width.

Side note: I have heard that link teaming doesn't aggregate across the link either, you only get one of the two links, and you can't team vSAN it will only ever use one link.

Reply
0 Kudos