Hi folks,
I've got a little bit of a frankenstein Dell server and I'm noticing that the storage is insanely slow. What makes it a non-standard Dell is that the original PERC H700 adapter has been removed, and in its place is a LSI 9217-8i controller with a pair of mirrored SSDs. The card was purchased as an 9207, but it was flashed with the compatible 9217 to give it the RAID firmware. There's also an LSI 9200-8e SAS controller in the system with nothing attached to it.
After noticing VM cloning being very slow, I tried from SSH in my /vmfs/volumes/<datastore> directory:
# time dd if=/dev/zero of=testfile bs=1M count=100
100+0 records in
100+0 records out
real 0m 7.59s
user 0m 0.23s
sys 0m 0.00s
That is, create a 100MB file on the SSD mirror. This takes almost 8 seconds! I think that comes out to about 13MB/sec for a sequential write. With a linux live CD, the same command takes a fraction of a second, and dd reports over 700MB/sec data transfer rate.
Just to make sure that my test is fair, I did the same sort of test on a similar server running ESXi, but with spinning-platter disks hung off of the H700:
# time dd if=/dev/zero of=testfile bs=1M count=100
100+0 records in
100+0 records out
real 0m 0.42s
user 0m 0.31s
sys 0m 0.00s
ESXi is the latest available 5.1 patch. The LSI driver has been updated as outlined here: http://adriank.org/how-to-update-mpt2sas-driver-on-esxi-5/
The LSI firmware has been flashed to match the updated version of the driver used by ESXi (19.0)
Why is the LSI SSD storage so slow? I wouldn't expect it to be because of the RAID-enabling flashing, as when booted to Linux the storage is amazingly fast.
OK, I finally figured it out!
Apparently the LSI HBAs in RAID mode aren't that great for ESXi. The HBA was originally purchased for ZFS use, so no RAID was needed. Now, when you flash a RAID firmware on the card, you are missing things like a battery backup and a write-back cache enabled.
To fix my performance, I needed to enable the write-back cache. Now, some may say that without a BBU this is dangerous. However, all of the VMs on this system are ephemeral, so data loss in case of power outage doesn't matter.
It took a lot of tries to get an lsiutil that works, but here's what did it for me:
This setting is persistent, so you'll never need to change this again. And now the same test:
# time dd if=/dev/zero of=testfile bs=1M count=100
100+0 records in
100+0 records out
real 0m 0.75s
user 0m 0.24s
sys 0m 0.00s
Success! I'm guessing that not too many people run into this because they run "real" RAID cards?
It might be worth mentioning that the SSD drives are SATA. This configuration works fine with FreeBSD and Linux, so it'd be strange if that were related to the problem here. I'm mentioning it just in case, though.
OK, I finally figured it out!
Apparently the LSI HBAs in RAID mode aren't that great for ESXi. The HBA was originally purchased for ZFS use, so no RAID was needed. Now, when you flash a RAID firmware on the card, you are missing things like a battery backup and a write-back cache enabled.
To fix my performance, I needed to enable the write-back cache. Now, some may say that without a BBU this is dangerous. However, all of the VMs on this system are ephemeral, so data loss in case of power outage doesn't matter.
It took a lot of tries to get an lsiutil that works, but here's what did it for me:
This setting is persistent, so you'll never need to change this again. And now the same test:
# time dd if=/dev/zero of=testfile bs=1M count=100
100+0 records in
100+0 records out
real 0m 0.75s
user 0m 0.24s
sys 0m 0.00s
Success! I'm guessing that not too many people run into this because they run "real" RAID cards?
Thanks for the tip.
I use vSphere 6.7.
In my case, I configured RADI-6, but I did not have to redo it, in the MegaRAID LSI's own WebGUI interface, I accessed the active raid and in Write Policy I checked the WriteCache option. Restarting the server, I noticed a big difference in boot, transfer and use of VMs! Improved 100% performance!