vsan 6.6 back-end performance

time81 · ‎08-01-2017

Hi,

im kind of unhappy with the back-end performance so far and was wondering if there is any way to get it faster:

3 nodes, dell r730xd with 2xE5-2630 v4 @ 2.20GHz, 256 gb ram.

Mellanox Connect-x4 100G NIC on Mellanox 100G Switch.

dell Perc h730 mini raid controller and each node has 12x4TB NL-SAS and 4x480GB Intel DC SSD.

the vsan datastore has around 130TB space.

now when we moved the first 1-2 VMs to the new cluster/datastore, the backend is hardly reaching 100/200mb/s

Congestion is showing and i see 40 on the E/A waiting graph.

Doenst that mean its not writing away fast enough ?

Compared to other testing with Windows/Linux Maschines and Transfers between the SSDs and even a "normal" Raid on those Dells, i know the servers can go way faster.

How do i find a bottleneck or how can i find out what my vsan can achieve when it comes to bandwith ?

jameseydoyle · ‎08-02-2017

Hi,

It does not sound like you should be experiencing congestion in this environment. I would suggest opening a case with VMware support to have the performance analysed to see where the bottleneck is coming from. Unusual or unexpected hardware issues can sometimes be the root cause of these kinds of problems. It certainly seems like IOs are being artificially throttled in your case.

You can use VSAN Observer to dig down through the layers to find which host/diskgroup/disk is causing the bottleneck but it would be too hard to explain on a post like this.

Try this document for some guidance:

https://blogs.vmware.com/vsphere/files/2014/08/Monitoring-with-VSAN-Observer-v1.2.pdf

TheBobkin · ‎08-02-2017

Hello time,

Can you tell us the Model (Part No. is best) of the disks (and driver/firmware)?

Larger disks are going to be slower here so I wouldn't be expecting blazing performance from disk-groups made up of 4TB drives (especially if they are HDDs).

Four disk-groups on one single H730p Mini could also be pushing that to its limits.

Testing or comparing any single VM performance is not going to give a reasonable comparison.

vSAN Observer is easy to set up for live or offline bundle generation and the graphs created are fairly self-explanatory (green = fine, red = (generally) bad):

https://kb.vmware.com/kb/2064240

Other alternatives for looking at performance here would be HCIBench and the in-built proactive Performance tests that can be run from vSAN:

kb.vmware.com/kb/2147074

Bob

GreatWhiteTec · ‎08-02-2017

Like TheBobkin mentioned, I would highly suggest using HCIBench. It's an OVA that's quick to deploy and use. We just released v1.6.2 and it helps you identify and helps pinpoint issues in order to reduce latency, and increase performance among others. Run the "Easy Run" test, and then check the logs to pinpoint the issue. HCIBench

If you do not want to deploy HCIBench, you can use vSAN Observer (from RVC) to see what is happening in your environment.

Often time we see what people consider poor performance is due to under utilization of the resources (1 or 2 drives while others are dormant). Again, HCIBench will tell you if this is the case.

time81 · ‎08-02-2017

Thanks guys, just gotten into that benchmark vm and will try to look at these things soon.

We have large disks indeed: 4 TB and its using 3x4TB and 1 SSD as group.

Display Name: Local TOSHIBA Disk (naa.5000039748109971)

Vendor: TOSHIBA Model: MG04SCA40EN Revis: DS06

SSDs are Intel DCS3610 480GB

Stress-test gives me around 10.000 IOPS and 81mb/s, 300 congestion and 140 E/A

time81 · ‎08-02-2017

FYI: Its that Dell PERC H730 Mini, the one with 2GB but running in HBA Mode

All

vsan 6.6 back-end performance