VMware Cloud Community
neojav123
Contributor
Contributor

vSan poor write performance

Hi guys. First post here. Nice to meet you all.

We deployed a 4 node vSan cluster using 2 diskgroups each (8 in total), having 1 cache tier and 7 capacity tier. We are using 480 Sata disks (Dell EMC R740 model).

Read speeds are decent, about 600 MB/s which is OK for me, but write speeds start from 600MB/s and after 2GB of transfer it drops to 100MB/s and it stays there. Excercise I am doing is copying-pasting a single 30GB in the windows explorer (windows 2019). 

My storage policy is a simple Raid 5, data which is stored in the 4 hosts (physical disk placement). I've tried the "turn off checksum" trick but I had no luck. I have also tried every other combination of configurations but still no luck. 

I transferred a VM to a local Datastore and tried the same example and write speeds are OK, higher than 500MB/s. Also, migrating the VM from host to host inside vSan, data is read and written at high speeds, so I know physical disks are capable of handling nice write speeds (I can see by watching the vmware resource monitor).

The only way I've achieved a not-so-decent write speeds is having a storage policy, Raid 1 based, with 8 stripes per object. I know this is not a common practice, since data is being held in 4 hosts and not in 2 hosts as it should be. This way I can increase my write speed from 100MB/s to 300MB/s. Anyway, this is not acceptable and I don't know what else to try.

1) My biggest question here is. There a VM named "A" hosted in server "1" with Raid1 policy. Checking disk placement, data is being distributed in server "2" and server "3". Why isn't a copy of the data being stored locally in server "1"? It just doesn't make sense to me. It should be way faster reading/writing locally (having a redundancy) than doing it in other hosts. 

I don't know what else to try. Here I attach a IO benchmark I did. No clue if it helps. 

Thank you in advance.

0 Kudos
3 Replies
depping
Leadership
Leadership

Writing locally wouldn't faster necessarily as the write always needs to be mirrored elsewhere (raid-1). So even if you write locally, you would still have to wait for the write to be acknowledged from the remote copy. Copying a file within a Windows VM isn't really decent performance test. To test the capabilities of vSAN I would recommend to use something like hcibench: https://flings.vmware.com/hcibench

Tibmeister
Expert
Expert

With vSAN, the network connectivity will be the biggest bottleneck, more so when running a policy using Erasure Coding.  10Gb networking is the absolute minimum, and even then, with some large writes I see my stack slow down like there's some cache dump or something occurring.  I don't have really any large writes, nothing more than a few hundred GB at most, so it's not been bad, but in hind-sight I wish I'd gone for 40GB for the added overhead and a separate switch stack so the packet switching would be dedicated.

I have two-node clusters that are directly connected 10Gb for vSAN and run RAID-1 and these are blistering fast, even for large writes.  it's only my cluster that has more than 2 nodes and uses Erasure Coding that I notice the slowdown.

0 Kudos
IRIX201110141
Champion
Champion

Please tell us if your 480GB SATA is the Buffer or the Capacity disk?

 I ask because i configure Dell vSAN Ready nodes on a daily bases and i cant remember that 480GB RI SATA SSD is a certified Buffer Disk for a AFA vSAN.  I know there is a single  960GB RI SATA one.

 

Regards,
Joerg

0 Kudos