VMware Cloud Community
GunterO
Contributor
Contributor

Stretched cluster, all flash, write performance issues. v6.7U1

Hello,

I'm setting up a new 2-node (+ witness appliance on a 3th, lower performance host, 1Gbit NIC) stretched cluster. All are ESXi 6.7 Update 1, same for Witness and vCenter appliance.

Both hosts are identical, SuperMicro's with 256GB RAM, 8x1TB SSD (7 capacity + 1 cache diskgroup). Both have a 30Gbit NIC's and a 10Gbit fiber interconnect, with a distance of ~100km in between. The roundtrip latency of this interconnect is around 5ms (I know, this is the limit).

Most health checks are green, except a strange one, considering both machine and software configs are identical:

Warning001.jpg

I don't understand why the 10.0.0.110 gets a warning, while the 10.0.0.112 is green. Both versions are identical.

And the recommended drivers are different...

This is another warning, but related to the one before. Same strange thing. Both installed drivers on the hosts are identical, same for the hardware (SAS controllers).

Warning002.jpg

So, my question. I'm experiencing write performance issues. I have a write performance of 65MB/s to the vSan datastore (e.g. restoring a backup), and a read performance of 550MB/s, wich is good.
Is the low write performance because of the long distance between the two sites? Or because of the FW/dirver warnings of the controller? But, when both hosts weren't a part of the stretched cluster, they had a good write performance.
What is the best practice in this setup to fix the controller warnings? Can I update both hosts individual, or does this needs to be done via the cluster (I'm fairly new to this, but the update procedure for hosts which are part of a cluster is quite confusing for me)?

In case the network distance is causing the low write performance, would jumbo frames (MTU 9000) help? Or any other thing I could do?

However, when I deploy 10 test VM's, and do write performance tests in those VM's, I get a backend throughput of ~500MB/s in the vSan. Why is this reaching 500MB/s, while using 10VM's, while I can't get anything higher than 65MB/s on one VM, or a direct write when doing a restore directly to the vSan datastore?

picturemessage_5yktrl3x.irt.png

Thanks for your help!

0 Kudos
2 Replies
GunterO
Contributor
Contributor

I found out that one host, the 10.0.0.110 is missing the vmware-esx-sas3flash.vib file. Strange, because I was sure they were identical. And assured by the hardware vendor, who installed the ESXi.
I can't put the host in maintenance mode right now, will do asap, and will install the file/update.

Could this cause the performance issues?

On the drivers website, I have the choice between "sas3flash_vmware_esx50_rel" and "sas3flash_vmware_nds_rel".

I assume I can use the NDS version for ESXi 6.7U1?

Thanks

0 Kudos
depping
Leadership
Leadership

What you could do by the way is run iperf and test the max that way, it is installed on ESXi 6.7 U1 by default under /usr/lib/vmware/vsan/bin/iperf3.

Anyway:

1. Latency is definitely a big factor in write performance, every write goes across the wire and needs to be acknowledged first

2. the NDS driver should be supported, make sure all hosts use the same driver before testing

3. when doing a single VM test you could also be hitting limits of a single device/host/diskgroup, as data by default isn't striped.

4. when doing a 10 VM test the numbers will be much higher as data is striped and results aggregated

0 Kudos