VMware Cloud Community
wd123
Enthusiast
Enthusiast
Jump to solution

LSI 9217 SAS is insanely slow, even with SSDs on ESXi 5.1

Hi folks,

I've got a little bit of a frankenstein Dell server and I'm noticing that the storage is insanely slow.  What makes it a non-standard Dell is that the original PERC H700 adapter has been removed, and in its place is a LSI 9217-8i controller with a pair of mirrored SSDs.  The card was purchased as an 9207, but it was flashed with the compatible 9217 to give it the RAID firmware. There's also an LSI 9200-8e SAS controller in the system with nothing attached to it.

After noticing VM cloning being very slow, I tried from SSH in my /vmfs/volumes/<datastore> directory:

# time dd if=/dev/zero of=testfile bs=1M count=100

100+0 records in

100+0 records out

real    0m 7.59s

user    0m 0.23s

sys     0m 0.00s

That is, create a 100MB file on the SSD mirror.   This takes almost 8 seconds!  I think that comes out to about 13MB/sec for a sequential write.  With a linux live CD, the same command takes a fraction of a second, and dd reports over 700MB/sec data transfer rate.

Just to make sure that my test is fair, I did the same sort of test on a similar server running ESXi, but with spinning-platter disks hung off of the H700:

# time dd if=/dev/zero of=testfile bs=1M count=100

100+0 records in

100+0 records out

real    0m 0.42s

user    0m 0.31s

sys     0m 0.00s

ESXi is the latest available 5.1 patch.  The LSI driver has been updated as outlined here:  http://adriank.org/how-to-update-mpt2sas-driver-on-esxi-5/

The LSI firmware has been flashed to match the updated version of the driver used by ESXi (19.0)

Why is the LSI SSD storage so slow?   I wouldn't expect it to be because of the RAID-enabling flashing, as when booted to Linux the storage is amazingly fast.

0 Kudos
1 Solution

Accepted Solutions
wd123
Enthusiast
Enthusiast
Jump to solution

OK, I finally figured it out!

Apparently the LSI HBAs in RAID mode aren't that great for ESXi.   The HBA was originally purchased for ZFS use, so no RAID was needed.   Now, when you flash a RAID firmware on the card, you are missing things like a battery backup and a write-back cache enabled.

To fix my performance, I needed to enable the write-back cache.  Now, some may say that without a BBU this is dangerous.  However, all of the VMs on this system are ephemeral, so data loss in case of power outage doesn't matter.

It took a lot of tries to get an lsiutil that works, but here's what did it for me:

  1. Burn SystemRescueCd (or maybe most Linux distros should work): https://www.system-rescue-cd.org/
  2. Get https://karlsbakk.net/LSIUtil%20Kit%201.63/Source/lsiutil.tar.gz
  3. Extract that tarball somewhere
  4. Run lsiutil
    1. Select your LSI adapter
    2. Select 21
    3. Select 32
    4. When it asks you about the write cache, enable it (1, I believe)
    5. For all other questions, stick with the default.
  5. Reboot


This setting is persistent, so you'll never need to change this again.   And now the same test:


# time dd if=/dev/zero of=testfile bs=1M count=100

100+0 records in

100+0 records out

real    0m 0.75s

user    0m 0.24s

sys     0m 0.00s


Success!  I'm guessing that not too many people run into this because they run "real" RAID cards?

View solution in original post

0 Kudos
3 Replies
wd123
Enthusiast
Enthusiast
Jump to solution

It might be worth mentioning that the SSD drives are SATA.   This configuration works fine with FreeBSD and Linux, so it'd be strange if that were related to the problem here.  I'm mentioning it just in case, though.

0 Kudos
wd123
Enthusiast
Enthusiast
Jump to solution

OK, I finally figured it out!

Apparently the LSI HBAs in RAID mode aren't that great for ESXi.   The HBA was originally purchased for ZFS use, so no RAID was needed.   Now, when you flash a RAID firmware on the card, you are missing things like a battery backup and a write-back cache enabled.

To fix my performance, I needed to enable the write-back cache.  Now, some may say that without a BBU this is dangerous.  However, all of the VMs on this system are ephemeral, so data loss in case of power outage doesn't matter.

It took a lot of tries to get an lsiutil that works, but here's what did it for me:

  1. Burn SystemRescueCd (or maybe most Linux distros should work): https://www.system-rescue-cd.org/
  2. Get https://karlsbakk.net/LSIUtil%20Kit%201.63/Source/lsiutil.tar.gz
  3. Extract that tarball somewhere
  4. Run lsiutil
    1. Select your LSI adapter
    2. Select 21
    3. Select 32
    4. When it asks you about the write cache, enable it (1, I believe)
    5. For all other questions, stick with the default.
  5. Reboot


This setting is persistent, so you'll never need to change this again.   And now the same test:


# time dd if=/dev/zero of=testfile bs=1M count=100

100+0 records in

100+0 records out

real    0m 0.75s

user    0m 0.24s

sys     0m 0.00s


Success!  I'm guessing that not too many people run into this because they run "real" RAID cards?

0 Kudos
technetosicoob
Contributor
Contributor
Jump to solution

Thanks for the tip.
I use vSphere 6.7.
In my case, I configured RADI-6, but I did not have to redo it, in the MegaRAID LSI's own WebGUI interface, I accessed the active raid and in Write Policy I checked the WriteCache option. Restarting the server, I noticed a big difference in boot, transfer and use of VMs! Improved 100% performance!

0 Kudos