Solved: Re: Perform graph of vSAN available IOPS and Capac...

dee0606 · ‎10-17-2022

Hi, I've three node cluster in environment as part of the vSAN evaluation for container deployment(Cluster API).

Let's say my vSAN node provide 1000 IOPS and 250 TB storage space. Three VM(container worker) is consuming the 100 IOPS each = 3*100 --> 300 IOPS so total available 700 IOPS for provision further VM. Where can I get the available IOPS.

For storage space usage, beautiful UI shows so much is reserved for system and this much used for overhead and available capacity to provision VM.

I'm looking for similar type of UI required for the IOPS.

depping · ‎10-17-2022

This does not exist. Key reasons for it are:

An IO is not IO. Some IOs are 4kb others are 512KB, so you cannot treat them the same
vSAN/vCenter doesn't know how much IO each device can take and what the sum of those are. IOPS cannot be compared to capacity provisioning unfortunately.

View solution in original post

depping · ‎10-17-2022

This does not exist. Key reasons for it are:

An IO is not IO. Some IOs are 4kb others are 512KB, so you cannot treat them the same
vSAN/vCenter doesn't know how much IO each device can take and what the sum of those are. IOPS cannot be compared to capacity provisioning unfortunately.

dee0606 · ‎11-14-2022

Thanks @depping ,That's answers but how enterprise storage claiming that system can provide 3M IOPS whereas vSAN don't have data to share.

Even in vSAN Sizer didn't see the available IOPS that captured. Any manual way to calculate/derive the IOPS in number?

Tibmeister · ‎02-01-2023

When a vendor says they can provide x number of IOPS, that's based on a combination of theoretical calculations and benchmarks with extrapolation.

A single device, SSD or HDD, can provide an estimated max IOPS, then you have the HBA/storage adapter, which can aggregate different devices, which can combine the IOPS of all devices. Then add in RAID, which changes the available IOPS calculation based on RAID set.

Add in the network bandwidth that things like HCI uses to move storage IO over the network, and you now have a bottleneck that will reduce the theoretical IOPS down. On top of all that, you have the IO size, which also impacts how many IOPS can be achieved based on the bandwidth of everything in-between the OS making the IO call and the disk or disks that take the IO call. One IO could be 4k in size, 8k, 64k, 128k, or even 1M. So, a drive that can sustain 177MB/s of transfer, or about 181k per seconds, which if that's 4k IO is about 45k IOPS. For a good SSD on a 6GBps SAS, that's about 50% of the capability of that single disk. 128k IO you are looking at about 1.4k IOPS for the same disk. Huge difference. 1M, well, you're looking at under 200 IOPS.

Now, let's say you have a RAID6 set of 10 SSD's, with really high end SSD's, you could expect between 66k and 600k IOPS from that RAID set, just off the cuff calculation with the specs on my drives and HBA. This will be different for every drive and HBA vendor, only testing will show. With this, you can expect between 260MB/s and 2.4GB/s of data transfer at 4k block size. At 128k block size, that can be anywhere from 4GB/s and 40GB/s. So if the HBA can actually drive 40GB/s bandwidth, your network now is the bottleneck and even on a 40G network link, you will only ever get roughly 90% to 95% of that on a good day, so you have a bottleneck.

When a vendor says they can provide x number of IOPS, read the fine print. Some credible vendors will provide their test sets and results, but even then, those are under ideal conditions and your mileage may vary.

The question you need to ask is, how much storage bandwidth does my workload need, then architect for that. Making purchasing decisions based solely on theoretical IOPS will get you in trouble 9 times out of 10.

As for why vSAN can't make the determination on how much IOPS you can drive, the answer is "It Depends". Different workloads have different IO sizes, different drive vendors have different specs, and to actually perform a test and benchmark it's disruptive because you have to starve some workload in order to measure absolute performance, so would you want to starve your workload of storage bandwidth just to test what could be? This can be done at the beginning before workloads are placed on storage, but again, IOPS is not a great measurement of performance, but a good indicator of performance when taken into account with latency and bandwidth.

I worked at a place a few years back that purchased a multi-million dollar SAN array that the vendor claimed could do 7 million IOPS and had all these graphs and such, but once installed and using IOSTAT, we could never get 50% of that under the absolute best circumstances. I tried 100% sequential write at 1k block size, etc, and there was always a bottleneck in the HBA's or the interconnects. Last a heard there was some legal discussions still ongoing, but some business manager saw the number and the "promise" of that number and thought it was golden. In reality, gathering performance data from the actual workloads, not even a quarter of that systems capability was needed to run the business, but someone somewhere read a whitepaper and ad and decided the money should be spent.

All

Perform graph of vSAN available IOPS and Capacity

thr