VMware Cloud Community
vbabic
Enthusiast
Enthusiast

vSAN All Flash performance troubleshooting

Hi,

We have 3 vSAN All Flash Clusters and the performance is not what we expected so I want to know how to troubleshoot if I have some configuration issues or we just expected too much (I excluded HW issues since we have the same situation on 3 clusters)..

All hosts are HPE Proliant DL380Gen9. Two clusters are 4 nodes, 2 disk groups of 7 capacity disks each, all SATA SSDs (those are clusters for test/dev VMs so we went with cheaper disks).

The third cluster is 8 node, also 2 disks of 7 capacity disks each. Capacity disks are SATA, Cache is NVMe.

vSAN network is using dedicated 10G interfaces. Everything is in HCL, health checks are green.

Two test clusters are on the latest 6.0 U3 build, the third cluster is on 6.0 U2, soon to be updated to the same build.

The two most noticable performance indicators are:

     -latency often goes over 100ms

     -when moving VMs from FC SAN to vSAN or cloning VMs on vSAN, we get 100 MBps of throughput

On the small cluster we use RAID5, on the large one RAID6, dedup is enabled. I know those settings affect the performance, but still 100MBps looks low and latency is definitely too high?

Did we just expect too much, or is there something we can look at?

Kind regards, Vjeran

Reply
0 Kudos
6 Replies
Wonlliv
Enthusiast
Enthusiast

Hi Vjeran,

I guess it would be a good start to test the environment using HCIBench

Or if you are at leasst at vSAN 6.1 start a proactive storage stress test under "Monitor > Virtual SAN > Proactive Test"

Best regards,

Chris

www.hyper-converged.com
Reply
0 Kudos
roman79
Enthusiast
Enthusiast

Hi vbabic​,

Just to clarify, what are the RAID controller model / firmware/ settings you're using for SATA SSDs?

Can you also share the exact build numbers for ESXi 6.0 U3 hosts (please use this link to quickly identify them - https://esxi-patches.v-front.de/​)?

Thanks,

Reply
0 Kudos
vbabic
Enthusiast
Enthusiast

Hi,

I ran all the proactive tests and tests with IOAnalyzer appliances (20 of them to get the parallelism) before putting any VMs on the datastore.

On the small clusters I got around 330 MB/s streaming writes with both tests, and on the larger cluster I got around 850 MB/s of streaming writes, i guess that is the test most relevant for migration and cloning operations. 330 MB/s seemed also low, but better than 100 MB/s..

Regarding latency, on most tests it was less than 10 ms (which also doesn't sound that great for an All-Flash system), it was much larger on tests with large block sizes, especially writes, of course.

Kind regards, Vjeran

Reply
0 Kudos
vbabic
Enthusiast
Enthusiast

Hi,

RAID controllers are HPE H240, firmware 5.04. What settings are you referring to?

6.0 U3 hosts are build 6921384.

Kind regards, Vjeran

Reply
0 Kudos
roman79
Enthusiast
Enthusiast

Hi vbabic​,

From the information provided, I see that the newer firmware is available for H240 - https://support.hpe.com/hpsc/swd/public/detail?sp4ts.oid=7553524&swItemId=MTX_45415600c27941abb996b7...

VMware has firmware version 6.06 certified for vSphere and vSAN already - VMware Compatibility Guide - I/O Device Search and VMware Compatibility Guide - I/O Device Search - I would update the device firmware (to version 6.06) and driver (to hpsa version 6.0.0.128-1OEM - Download VMware vSphere ) first.

In regards to the controller settings, I suppose it is in the Pass-Through mode - https://www.vladan.fr/vmware-vsan-pass-through-vs-raid0-controller/ . I would recommend to check whether Write Cache and Read Ahead were disabled, and if not to turn them off - SSD RAID » ADMIN Magazine .

If non of this resolves the issue, I would probably raise the ticket with VMware and HPE to see what they say.

Hope this information helps.

Reply
0 Kudos
ac4gc
VMware Employee
VMware Employee

Hello,

Also please check the current QOS configuration on your physical switch and how much switch is allocating to vSAN regarding the receive queue size.

Reply
0 Kudos