HCIBench Results - Poor Throughput in AFA?

JimPhreak · ‎01-17-2017

I've got a 2-node ROBO AFA vSAN cluster at home consisting of the following in each of the two nodes:

Cache Disks - 400GB Hitachi HUSSL400 SAS SSDs

Capacity Disks - 2 x 800GB Intel S3500 SSDs

Dual 10Gb Uplinks

I ran an HCIBench test last night which returned the following and I'm concerned about the throughput results as they seem very low for a cluster connected via 10Gb:

Datastore: vsanDatastore

VMs        = 6

IOPS       = 12008.79 IO/s

THROUGHPUT = 46.91 MB/s

LATENCY    = 33.9552 ms

R_LATENCY = 9.2342 ms

W_LATENCY = 91.6067 ms

=============================

Datastore: vsanDatastore

95th Percentile Latency = 94.426

IOPS associated with 95th Percentile Latency = 4843.0

=============================

Resource Usage:

CPU USAGE = 37.98%

RAM USAGE = 70.34%

VSAN PCPU USAGE = 8.6397%

acasini · ‎01-23-2017

Looks like a lot of CPU and RAM utilized, what else is running inside this cluster?

You should also share a detailed hardware configuration.

Same goes for the tests you've been running.

JimPhreak · ‎01-23-2017

VMs are as follows:

DC1 - Windows Server 2k12 Domain Controller (2 vCPU, 4GB RAM)

SPE-BACKUP - Windows 10 VM for Veeam and other backups (4 vCPU, 8GB RAM)

SPE-UBUNTUSVR01 - Main Plex Transcoding box (16 vCPU, 16GB RAM)

SPE-UBUNTUSVR02 - Runs all slew of other dockers (8 vCPU, 8GB RAM)

UnRAID01- Bulk Media Storage Array (4 vCPU, 8GB RAM)

VCSA6 - vCenter (2 vCPU, 8GB RAM)

UnRAID02 - Mirror of UnRAID02 (2 vCPU, 4GB RAM)

vWitness - Self Explanatory (2 vCPU, 8GB RAM)

Hardware Specs are as follows:

ESXi01:

Xeon D-1537 (8-core/16-thread)
64GB DDR4 RAM
Hitachi 400GB HUSSL SAS SSD (cache)
Intel 800GB S3500 SATA SSD (capacity) - x2
Dual port Intel X552 SFP+
Dual 1Gb NICs

ESXi02:

See ESXi01 specs

ESXi03:

Xeon D-1518 (4-core/8-thread)
32GB DDR4 RAM
Hitachi 400GB HUSSL SAS SSD (local datastore)
Dual port Intel X552 SFP+
Dual 1Gb NICs

ESXi04:

Xeon D-1508 (2-core/4-thread)
16GB DDR4 RAM
Hitachi 400GB HUSSL SAS SSD (local datastore)
Dual port Intel X552 SFP+
Dual 1Gb NICs

ESXi05:

vSAN Witness

You may be asking why not just make a 4-node vSAN cluster instead of a 2-node ROBO cluster. Well the reason is, the storage controllers in ESXi03 and ESXi04 (which are the same LSI2116 as in ESXi01 and 02) are being passed through to their respective UnRAID VM's. I have LSI2008 controllers in them as well but they are not supported by vSAN and I had tons of problems when i tried to use them (disks kept showing up as not there/errors, disk groups dropping out, etc.).

As for tests, I simply ran the "Easy" standard HCI Bench test.

GreatWhiteTec · ‎01-28-2017

120K IOPs for a 2-node is not too shabby. As far as throughput, please verify that the drivers on your NICs are at the latest according to VCG. How is your virtual switch configured? LACP? LBT?

JimPhreak · ‎01-30-2017

My Intel X552 10Gb SFP+ NICs are on the current driver (4.4.1). My vDS is using LBT as LACP gave me a lot of problems when trying to configure it on my Cisco SG350XG-24F.

GreatWhiteTec · ‎02-08-2017

This will also depend on the parameters used on HCIBench. If you want to run a standard bechmark test you can run the following:

# VMs= hosts * 4
# VMDKs per VM= 4
Initialize storage before testing = either Random or NONE for AF with DD/C
Test duration= at least 1 hr (3600)
Threads= 8-16 for a total of 128-256
Block size+ 4K or 8K or whatever your test workload looks like
Read %= 70 (typical distribution)
Warmup= 0 for AF
Random = 100 or 80 is fine. Again depends on what workload you want to test.

Looks like you are a little resource bound on the RAM aspect. You may or may not need to tweak these parameters.

All

HCIBench Results - Poor Throughput in AFA?