I've got a 2-node ROBO AFA vSAN cluster at home consisting of the following in each of the two nodes:
Cache Disks - 400GB Hitachi HUSSL400 SAS SSDs
Capacity Disks - 2 x 800GB Intel S3500 SSDs
Dual 10Gb Uplinks
I ran an HCIBench test last night which returned the following and I'm concerned about the throughput results as they seem very low for a cluster connected via 10Gb:
Datastore: vsanDatastore
VMs = 6
IOPS = 12008.79 IO/s
THROUGHPUT = 46.91 MB/s
LATENCY = 33.9552 ms
R_LATENCY = 9.2342 ms
W_LATENCY = 91.6067 ms
=============================
Datastore: vsanDatastore
95th Percentile Latency = 94.426
IOPS associated with 95th Percentile Latency = 4843.0
=============================
Resource Usage:
CPU USAGE = 37.98%
RAM USAGE = 70.34%
VSAN PCPU USAGE = 8.6397%
Looks like a lot of CPU and RAM utilized, what else is running inside this cluster?
You should also share a detailed hardware configuration.
Same goes for the tests you've been running.
VMs are as follows:
DC1 - Windows Server 2k12 Domain Controller (2 vCPU, 4GB RAM)
SPE-BACKUP - Windows 10 VM for Veeam and other backups (4 vCPU, 8GB RAM)
SPE-UBUNTUSVR01 - Main Plex Transcoding box (16 vCPU, 16GB RAM)
SPE-UBUNTUSVR02 - Runs all slew of other dockers (8 vCPU, 8GB RAM)
UnRAID01- Bulk Media Storage Array (4 vCPU, 8GB RAM)
VCSA6 - vCenter (2 vCPU, 8GB RAM)
UnRAID02 - Mirror of UnRAID02 (2 vCPU, 4GB RAM)
vWitness - Self Explanatory (2 vCPU, 8GB RAM)
Hardware Specs are as follows:
ESXi01:
ESXi02:
ESXi03:
ESXi04:
ESXi05:
You may be asking why not just make a 4-node vSAN cluster instead of a 2-node ROBO cluster. Well the reason is, the storage controllers in ESXi03 and ESXi04 (which are the same LSI2116 as in ESXi01 and 02) are being passed through to their respective UnRAID VM's. I have LSI2008 controllers in them as well but they are not supported by vSAN and I had tons of problems when i tried to use them (disks kept showing up as not there/errors, disk groups dropping out, etc.).
As for tests, I simply ran the "Easy" standard HCI Bench test.
120K IOPs for a 2-node is not too shabby. As far as throughput, please verify that the drivers on your NICs are at the latest according to VCG. How is your virtual switch configured? LACP? LBT?
My Intel X552 10Gb SFP+ NICs are on the current driver (4.4.1). My vDS is using LBT as LACP gave me a lot of problems when trying to configure it on my Cisco SG350XG-24F.
This will also depend on the parameters used on HCIBench. If you want to run a standard bechmark test you can run the following:
Looks like you are a little resource bound on the RAM aspect. You may or may not need to tweak these parameters.