VMware Cloud Community
elerium
Hot Shot
Hot Shot
Jump to solution

VSAN 6.2 checksum enabled causing slow write speeds?

On my VSAN 6.2 test cluster I've noticed significantly slower sequential write speeds, from 1000+MB/s down to 250MB/s, enough where the built in health stress test begins to fail from failed writes. After tracing it, it looks like disabling the new checksum feature restores my sequential write speeds. Has anyone else seen similar results from leaving the checksum feature on? CPU utilization on hosts are only around 10% utilization when I run tests.

5xDell R730xd, disk groups are 2xP3700 1.6TB SSDs, 14x4TB WD RE4 SAS drives, 384GB RAM, E5-2650v3.

vsan62-checksumdisabled.png

vsan62-checksumenabled.png

Tags (1)
1 Solution

Accepted Solutions
elerium
Hot Shot
Hot Shot
Jump to solution

The slow writes when using Intel NVME disks appears to be related to intel-nvme drivers, the checksum feature added since VSAN 6.2 and larger block sizes. There is finally improvement after 2 years from this KB article: Low I/O performance using intel-nvme drivers 1.3.2.8-1OEM and 1.3.2.4-1OEM with block sizes larger t...

Switching drivers from intel-nvme 1.2.0.8 to 1.2.1.15-1OEM (am on VSAN 6.6.1) greatly improved write performance for sequential I/O when checksum feature is enabled.

Am now seeing 450-800MB/s write performance for sequential I/O when this was previously around 250MB/s with older intel-nvme drivers. This is still lower than the 600-1100MB/s write performance with checksum off but is a massive improvement now.

View solution in original post

Reply
0 Kudos
19 Replies
zdickinson
Expert
Expert
Jump to solution

Good afternoon, Duncan writes about when you should enable checksum here:  http://www.yellow-bricks.com/2016/03/24/vsan-6-2-checksumming/

Perhaps some clues.  Thank you, Zach.

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

I've read about the checksumming feature on Duncan and Cormac's blogs, it's enabled by default in 6.2 and policies need to be created to disable them.

I originally thought it would just impact disk space usage to store the checksum and some CPU for the computation but it looks like there are write performance penalties incurred as well which I wouldn't expect.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

There is a performance penalty indeed, keep in mind that the checksum hash is calculated at the client (where the VM runs) before the IO is send across the wire. This happens for each 4KB block and that is the penalty you are seeing. The tests I have seen showed a relatively low performance penalty, in the order of 1 or 2% even.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

Also, I have not seen similar results. Then again, I do not use CrystalDiskMark typically to test.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

just looking at you config and notice you are up on 6.2 with the 730, you know this is unsupported at the moment?

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

Yes I'm aware 6.2 isn't supported with the Dell 730s right now, waiting on raid controller certification. This is just in a test lab to look at all the 6.2 changes, everything I care about is still happily running in 6.1 Smiley Happy

I'm using NVME SSDs so these shouldn't involve the raid controller which I know isn't on HCL yet and the test data set should still easily fit within SSD cache. I would think that all writes still go the SSD layer first, then ack back to the client before destaging to magnetic disks. If checksum enabled is writing the checksum data directly to the magnetic disks instead of the SSD cache first, then I can understand the magnitude of the write slowdowns.

Reply
0 Kudos
depping
Leadership
Leadership
Jump to solution

the checksum is calculated before it is even transferred over the network, on the host where the VM is running. So way before it hits the caching layer. but personally I have not seen this kind of performance difference yet, and I would recommend filing a support request.

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

The latency/congestion problems were mostly resolved by using nvme v1.2.0.27-4vmw.550.0.0.1331820 driver instead of the intel-nvme HCL driver 1.0e.1.1-1OEM.550.0.0.1391871 described in this post Intel DC P3700 Firmware.

Opened a case (16981161705) and the reply from engineering was:

"synthetic benchmarks which aggressively perform write IOs, will see degradation wrt 6.1 when checksum is enabled. The benchmark was maxing out what the system could do in 6.1, and the checksum feature causes additional IOs, hence the degradation. The more aggressive the benchmark, the more the lower performance wrt 6.1."

I also asked, Are any known issues for Intel NVME devices in VSAN 6.2 or any other known issues related to checksum and poor performance? got this reply:

"per the PR, the only negative engineering mentioned would be related to using bench marking software with checksum enabled.

performance slowness would be expected until engineering releases a patch for this issue."

So basically this appears to be normal operation where 4k I/O write performance has a small penalty with checksum on, but with larger block sizes for sequential writes (in this case 128k), appears that max speed ends up around 200-250MB/s. In 6.1 on the same cluster it was ~800MB/s. I'm finding the performance is generally fine for most workloads, but if there is a resync occuring, congestion occurs much more rapidly than previous versions and performance drops fairly rapidly where this wasn't the case before. Hope this can be improved upon in future releases.

Reply
0 Kudos
elliott_w
Contributor
Contributor
Jump to solution

I thought I would bump this instead of creating a new thread.

We have a support ticket open (17588708310 - over 2 weeks old) for what appears to be exactly this, basically with checksum enabled on 6.6 we get massive component congestion, latency and sequential write speeds between 100 and 200MB/sec for large block size writes (on P3700 cache/Samsung SM836 capacity that has a sequential write performance of >450MB/sec on a single drive, tested as a single VMFS datastore).

Looks like it is most obvious when stripe=1, and with Windows file copies in a VM (large block sequential writes) we see write speeds jumping between 80 and 180MB/sec. Disable checksumming and we immediately get 300-400MB/sec consistent speeds.

If we set stripe=4 and repeat the test it is much more consistent (but still not as good as with checksumming disabled.. Assuming because this is splitting the blocks up to smaller sequential writes?

Clearly large block checksums are horribly broken in some way... I'm amazing there aren't more articles about this.

See attached iometer graphs for a 100GB sequential write between two VSAN disks (Stripes=1) on a windows VM. Same thing to a single VMFS disk gives >400MB/sec...

Hoping for some input here from the experts Smiley Happy This has now been open for over 2 weeks with vmware (platinum) support...

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

What driver & firmware version are you on for the P3700s? I've tested intel-nvme 1.2.0.8 drivers and 8DV101H0 firmware and the things look good there in my environments (mix of 6.6.1 and 6.2U3 clusters). I don't have any Samsung SM836 drives, but I would check on VSAN HCL and look at updating to the latest certified driver/firmware for vSAN 6.6.

One thing that's clear to me about using checksum in vSAN is that the performance isn't great for workloads with large sustained writes > 250MB/s. For me, with workloads that push > 250 MB/s with large sequential writes for sustained periods, eventually the cache layer won't keep up (happens around 2 hours on a P3700 2TB cache).  I start seeing congestion/latency increase rapidly and eventually cluster performance suffers after 10-12 hours of this continuous data write. Generally this isn't an issue unless you really have these sort of workloads, and then I'd probably recommend you identify the workload and apply a policy with no checksum.

If you have large sustained write workloads that happen in bursts, you can somewhat buffer the negative effects by changing /LSOM/lsomLogCongestionLowLimitGB and /LSOM/lsomLogCongestionHighLimitGB on each host (no reboot needed and effect is immediate) to a higher values like 48 for LowLimit and 72 for HighLimit.  (Modification of congestion-related vSAN advanced parameters (2149096) | VMware KB​ ). This only delays the latency/congestion effects, and shouldn't be used to support a workload that is continuously pushing this nonstop, in which case I'd recommend disabling checksum for those VMs.

Reply
0 Kudos
Neoxq
Contributor
Contributor
Jump to solution

Hi Elliot,

We have the exact same problem / issue with our VSAN 6.6.1 All Flash.

Our ticket at VMWARE is now open for almost 1,5months and still no fix.

The disable Checksum option isn't the best option, we have Citrix AppLayering and we create Images / Temp VM's and those VMDK's are written directly to VSAN, before it will be attached to a VM.

So no VSAN Policy is in place for these kind of workloads, so we have alot of problems with the slow performance.

It's not what we expected when we decided to go All Flash.

Our configuration is 5x HP ProLiant DL360Gen8, 192GB Mem, Raid P420i, met ieder 7xHPE Read Intensive - solid state drive - 1.6 TB - SATA 6Gb/s en 1x Cache: HPE Mixed Use - solid state drive - 960 GB - SATA 6Gb/s

We tried several firmwares of our P420i raid contoller.

Reinstalled ESXi with HPE Images and with VMWare default images.

We recreated the VSAN, Enabled / Disabled Deduplication.

Tried different Striping, including 4 stripes like you.

We disabled the Smartpath.

Today a VMWare Engineer recreated our VSAN datastore but no solution yet.

We're still waiting.

With kind regards,

Mitchel

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

You can try cloning the default profile (with checksum on) and then assign it to your existing workloads (should not cause a resync), and then disable checksumming on the default storage profile. Once applied to the default storage profile, no checksum will be applied to all newly created workloads by default.

I know it's not great but at least it's something to get your environment in a better performing state until VMware fixes or improves checksumming.

It's also worth making sure your SSDs are on the latest firmware/drivers as certified on vSAN HCL, I have seen large changes in performance/latency across different driver/firmware combos for various SSDs.

Reply
0 Kudos
elliott_w
Contributor
Contributor
Jump to solution

Interesting to hear so many people with the same issue!

So far I've wasted 2 weeks running time consuming collections of throughput tests and rebuilding parts of the infrastructure at VMWare's request, and we only got an explanation because we managed to identify checksumming as the cause (vmware support never suggested disabling checksumming to test).... which is exceptionally frustrating given this is apparently a known issue. It almost seems like VMWare is trying to bury this at their end, as only when we escalated this multiple times did we get a response indicating that engineering was aware of the checksum problem, and there is no KB about it.

Vmware have told us it is hardware agnostic, and I rebuilt the disk pools using the SM863's as cache and experienced the same issue, so I suspect the NVMe drives are not to blame here...

VMWare has "recommended we do not disable checksumming", and given the latest data corruption bug in VSAN we are fairly keen to leave this enabled. I guess all we can really do is wait to see what the outcome is from VMWare's end, as we have had no luck with any combination of controller drivers and we are running the latest firmware for the SSD's...

And having said that, waiting for VMWare to fix this has our technical team worried. VMWare support has been nothing short of terrible (I'm used to Cisco TAC, which is consistently exceptional, but this is our first support case with vmware and they have been really, really bad).

We're not far off scrapping this whole project and bailing to Nutanix at this point Smiley Sad

Reply
0 Kudos
Dielessen
Contributor
Contributor
Jump to solution

There are without doubt some major bugs in Vsan at the moment.

We are now 3 month futher and we have only collected logs :S

How far is everybody on this subject ? is there any progress?

Reply
0 Kudos
elliott_w
Contributor
Contributor
Jump to solution

We have escalated this as far as we can with vmware Engineering.

They have come back last week and told us it is "expected behavior" and there is nothing wrong with the product. There are no fixes - It is simply our deployment and use case (to expect even moderate sequential write performance) that is wrong.

Yep, that's right. An enormous drop in performance with checksumming is not a bug at VMWare, it is expected behavior!

VSAN can't even keep up with the spinning disk SAN it was purchased to replace, and all we have gotten is what appears to be a cover-up from support... VMWare has lost our trust.

We are talking to an alternative HCI vendor now to see how quickly we can move across.

Reply
0 Kudos
AartK
Contributor
Contributor
Jump to solution

Is your problem fixed now with vSAN 6.6 or did you move over to another HCI supplier?

Reply
0 Kudos
Neoxq
Contributor
Contributor
Jump to solution

Hi All,

Our performance issues are completely fixed after installing the recent hotfixes which are released on 19-dec 2017

VMware Knowledge Base

VMware ESXi 6.5, Patch Release ESXi650-201712401-BG: Updates esx-base, esx-tboot, vsan, and vsanhealth VIBs (2151104)

I've installed the following patches:

pastedImage_1.png

Now we have checksum enabled on our standard storage policy.

With kind regards,

Mitchel

Reply
0 Kudos
aqualityplacem
Contributor
Contributor
Jump to solution

Hi Mitchel

Are you able to post your sequential write performance results that you see now its all working?

Reply
0 Kudos
elerium
Hot Shot
Hot Shot
Jump to solution

The slow writes when using Intel NVME disks appears to be related to intel-nvme drivers, the checksum feature added since VSAN 6.2 and larger block sizes. There is finally improvement after 2 years from this KB article: Low I/O performance using intel-nvme drivers 1.3.2.8-1OEM and 1.3.2.4-1OEM with block sizes larger t...

Switching drivers from intel-nvme 1.2.0.8 to 1.2.1.15-1OEM (am on VSAN 6.6.1) greatly improved write performance for sequential I/O when checksum feature is enabled.

Am now seeing 450-800MB/s write performance for sequential I/O when this was previously around 250MB/s with older intel-nvme drivers. This is still lower than the 600-1100MB/s write performance with checksum off but is a massive improvement now.

Reply
0 Kudos