Jauneorange972
Enthusiast
Enthusiast

vSAN FULL SSD - Perc H730 - Really poor write performance

Jump to solution

Hello,

We have configured a full SSD cluster vSAN 6.0U2 of 6 nodes.

Each node have the configuration below:

ESXi 6.0U2, with last patch.

Dell R630

PERC H730 mini / Firmware : 25.5.0.0018 /  Driver: lsi-mr3 version 6.904.43.00-1OEM / Passthrough / cache disabled

Intel 10G 4P X710 SFP+ / Driver : i40e version 1.4.28 / Firmware : 5.04

1 x 400 Go  Intel® SSD DC S3710 Series and 3 x 800 Go Intel® SSD DC S3510 Series.

We have poor performance when we are performing some tests only in writing mode.

All check vSAN are green.

Here are some example:

Test performance storage 100% write + WB : around 6000 IOPS by ESXi, for 10 vmdk by esxi we have 600 IOPS, to resume we have around 36000 IOPS in writing mode... for the full cluster.

What do you think ? These values are normal for a cluster vSAN ? I am a little bit disappointed.

I really don't know if we have made mistakes but we have applied the right drivers and firmware of the network card, raid card.

Best regards

0 Kudos
1 Solution

Accepted Solutions
Jauneorange972
Enthusiast
Enthusiast

Hi,

We have an explanation.

Using raid card in JBOD mode is not optimized at all, we have got really poor performance, with or without vSAN.

Now we use only SAS card (HBA330), and now we have really good performance with vSAN.

Best regards

View solution in original post

0 Kudos
14 Replies
elerium
Hot Shot
Hot Shot

Those numbers do seem a little bit low with writes, I have a 3 node VSAN 6.0U2a Build 4600944 all flash, 1x400gb Intel P3700, 5x800gb Intel S3510 and for the same 100% write + WB test (running 30 min), i see 33000 IOPS per host and ~100000 IOPS for the cluster. I have slightly different hardware though R630, H730 controller firmware 25.4.1.0004, driver 6.903.85.00-1OEM and Intel x540 NICs.

Do you have dedupe, compression or RAID5/6 in use? These carry some performance penalties, especially true with writes.

Not sure if it will make a difference, but referencing the VSAN HCL, the H730 for firmware 25.5.0.0018 isn't certified yet for 6.0U2 and only for ESXi 6.5

It is certified however for ESXi 6.5 using driver lsi_mr3 version 6.910.18.00-1vmw.650.0.0.4564106 VMware Compatibility Guide - I/O Device Search

You may want to contact support to see what they recommend for H730 drivers on this firmware and on 6.0U2. I would guess your nic firmware/drivers look good. Only other thing is maybe update the firmware on the SSDs, I don't think it affects performance but there were some firmware bugs that were fixed on the S3510s earlier this year.

0 Kudos
depping
Leadership
Leadership
0 Kudos
zdickinson
Expert
Expert

Good morning, vSAN performance can be very hard to check.  Poor Write Performance

Are you using IOMeter?  HCLBench and/or multiple VMs running IOMeter can provide more accurate results. Thank you, Zach.

0 Kudos
elerium
Hot Shot
Hot Shot

He isn't using NVME drives, but on another note, the VSAN component HCL for Intel NVME drives has not been accurate for quite some time in regards to drivers.

The URL http://frankdenneman.nl/2016/04/19/maximizing-performance-with-the-correct-drivers-intel-pcie-nvme-d... is 100% accurate and correct for VSAN 6.1 but my experience has been that using intel-nvme 1.0e.1.1 as recommended on HCL and any older sites you see out there referencing VSAN 6.1 or older will leave you with an unusable cluster full of latency problems if you're on VSAN 6.2. I haven't tested VSAN 6.5 yet so can't comment on that.


The drivers below do work okay with VSAN 6.2. VMware support seems to be aware of this too. Intel NVME & VSAN 6.2 is one of those very few cases where if you follow HCL for VSAN you'll run into problems.


nvme 1.2.0.27-4vmw.550.0.0.1331820 https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESX55-VMWARE-NVME-12027-4&productId=353

intel-nvme 1.2.0.7 https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI60-INTEL-LOCALSTORAGE-INTEL-NVME-1207&...

0 Kudos
Jauneorange972
Enthusiast
Enthusiast

Hi all,

We found that PERC H730 used in our R630 have really poor writing performance in HBA mode, we tested it by using IOmeter directly on SSD caching, booting directly on the serveur with an Windows server.

We have an huge difference when we are in RAID 0 mode et HBA mode, seems that PERC H730 is not the best element to use in vSAN.

When in RAID 0 mode, we gets the same performance if we compared to the technical paper of the SSD. However, in HBA mode, we gets hardly 20% of writing performance.

So we call directly our dell's contact, we have to decide to use the Dell HBA330 (part of vSAN ready node Dell AFF), when testing it, we gets the same performance if we compared to the technical paper of the SSD, it seems good.

We replaced all raid card PERC H730 by Dell HBA330, we have six nodes in vSAN full flash streched mode, but we still have poor performance.

In attachments, we have test the writing performance with the workload "100% write, usage optimal for the WB" for 10 minutes with a storage policy (1 FTT, 4 Strippes and RAID 1 protection). We really don't understand.

All test with vSAN are all green, what we missed ?!

Help please Smiley Happy

0 Kudos
Jauneorange972
Enthusiast
Enthusiast

Hi,

We used two methods.

1- We tested performance by using the test tool in web client vSphere with the diffrent workload, we gets hardly 900 IOPS by nodes, 10 vmdk, so 10000 IOPS.

2- We tested with a vmdk located on vSAN datastore (1 FTT, RAID1, SP4) and IO Meters. We compared the value with the same test on vmdk located on nfs share based on a netapp 2552. vSAN is better in reading mode, but in writing mode, vSAN is low compared to nfs share on the netapp.

We really missed something....

0 Kudos
elerium
Hot Shot
Hot Shot

One thing to keep in mind is that when you see 900 IOPS by the guest, it's really 1800 IOPS on the storage (because of FTT=1), so your per node is 20000 IOPS. Run the test for 20 minutes and take a look at the performance graph in Monitor->Performance->VSAN Backend when viewing at cluster level or node level. I think that is more accurate for overall node/cluster performance.

The checksum feature (on by default) seems to also slow down the total throughput (250MB/s max for a VM) for write I/O on VSAN, so if you need higher than this, also disable checksum on storage profiles if you have a use case requiring > 250MB/s continuous writes.

Other difference that may explain the performance difference between your cluster and mine is my SSD caches are NVME (handled by PCIe bus and doesn't use a raid controller and have much higher IOPS spec) so performance will be higher for writes as they all go to the SSD cache first.

0 Kudos
mschubi
Enthusiast
Enthusiast

‌Hello Jauneorange972,

you  wrote that HBA is in passthrough mode.

for VSAN you need HBA Mode. It's important.

and on same controller no disks with VMFS.

best regards,

Mike

0 Kudos
elerium
Hot Shot
Hot Shot

They are the same thing, just different terminology

HBA mode, passthrough mode, IT mode, JBOD mode all mean the same thing

0 Kudos
mschubi
Enthusiast
Enthusiast

Hello elerium,

for PERC controller the terminology is not the same.

PERC Passthrough mode means: controller is in RAID mode but disks are configured as passthrough/direct disk without any RAID functionality (most RAID controller call it JBOD).

PERC HBA mode means: you switch the PERC via iDRAC (or CLI) in HBA mode (this has nothing to do with controller BIOS options!).

The PERC controller lose than the RAID functionality completely. Thats total differently functionality and operation mode for this controller.

We have seen 15K Frontend IOPs on nodes with only one diskgroup per node and small and cheap "cache" SSDs.

Much more if you use true "Write intensive" SSDs.

best regards,

Mike

0 Kudos
GreatWhiteTec
VMware Employee
VMware Employee

A few things here. Firmware version for the SSD drives is not stated. Latest on VCG is G2010140, there are newer version but G2010140 is the latest supported. Based on experienced, running older FW results in low queue depth ~5 per drive. You are using SATA SSD, so... you get what you pay for.

More importantly, you should test against your workload profile. You are testing 100% WR???

How are you testing this? Proactive tests from vCenter? I HIGHLY recommend using HCIBench for this, you will see a huge difference as you can specify warm-up time, storage initialization, etc.

Here is some info:

Use HCIBench Like a Pro - Part 1 - Virtual Blocks

Use HCIBench Like a Pro – Part 2 - Virtual Blocks

Introducing HCIBench 1.6 - Virtual Blocks

HCIBench

0 Kudos
elerium
Hot Shot
Hot Shot

Thanks for that explanation, I didn't realize the difference on the Dell PERC.

0 Kudos
roman79
Enthusiast
Enthusiast

Hi Jauneorange972​,

Any update on the PERC H730 Mini performance with the recent vSphere 6.0 U3+ and ?

I would also suggest looking at the following blog posts:

Regards,

0 Kudos
Jauneorange972
Enthusiast
Enthusiast

Hi,

We have an explanation.

Using raid card in JBOD mode is not optimized at all, we have got really poor performance, with or without vSAN.

Now we use only SAS card (HBA330), and now we have really good performance with vSAN.

Best regards

0 Kudos