VMware Cloud Community
spetty7326
Contributor
Contributor

[Solved] iSCSI write performance to Openfiler 2.99 terrible, 2 MB/s

I've been struggling with a new Openfiler 2.99/ESXi 4.1 installation for several days, and  am in need of some further help or suggestions. My write performance  with an iSCSI volume to an ESXi 4.1 guest is about 2 MB/s.  Read  performance is fine at about 75 MB/s.  With write performance like that,  the machines are just crawling along.  I had a number of guest VMs  installed locally on the internal (non-RAID) HD of the host ESXi  machine, and was getting good performance - read and write in the 70  MB/s range.  When I migrated one over to the new OF setup, write  performance is essentially non-existent!

The server is a new HP  N40L, with (4) 2TB Seagate Barracude ST2000DL003-9VT1 drives.  6 GB of  RAM, and a 2-port BCM5709 gigabit NIC.  OF 2.99 is installed to a USB  drive, and as soon as it was installed I used the system update function  to update all packages.  OF is connected to the ESXi host through a  Trend S80 switch that supports jumbo frames, although I'm not using  them.  There's nothing else on the switch.

All 4 drives are  configured in a single volume group, RAID 5 w/ 5.73 TB of usable space.   There is currently only 1 iSCSI volume, 1 TB in size, and a single  LUN.  I checked hdparm locally on the OF box, and am getting buffered  disk reads of the individual disks of 145 MB/s.  Buffered reads from  /md0 are 320 MB/s. 

I initially configured the vSwitch and  vmnic for jumbo frames, 9000 MTU.  During troubleshooting though I set  them back to 1500, and changed OF to 1500 to eliminate a jumbo frames  issue or a switch issue.  No improvement.

I turned off flow  control for tx, rx, and autonegotiate.  No change.  I verified that all  of the iSCSI parameters in the software initiator match those in OF.  No  change.

I'm really out of ideas at this point.  Any suggestions for things to check/change would be greatly appreciated.

Reply
0 Kudos
9 Replies
TomHowarth
Leadership
Leadership

you do not say what type of disks you using,  Are they SAS, NL-SAS or SATA,  what spindle speed are they,

Tom Howarth VCP / VCAP / vExpert
VMware Communities User Moderator
Blog: http://www.planetvm.net
Contributing author on VMware vSphere and Virtual Infrastructure Security: Securing ESX and the Virtual Environment
Contributing author on VCP VMware Certified Professional on VSphere 4 Study Guide: Exam VCP-410
Reply
0 Kudos
spetty7326
Contributor
Contributor

Sorry about that.  The ST2000DL003 is 2TB, 5900 RPM, SATA 600 6Gbps.

Reply
0 Kudos
ASekora
Contributor
Contributor

Hi Spetty,

Although you're seeing that type of performance with hdparm, hdparm isn't a helpful test at all. 1) its only testing reads 2) its doing it sequentially, not randomly. A better test would be bonnie or similar.

SATA disks are notoriously low performing when you start to throw random IO at them. The problem is further compounded by the fact that you'r using RAID5, which is almost always going to have far worse writes than reads. Why? Every time you try to write a block to one disk it has to read from 2 others and write to the 4th for parity. This is unlike reads which only need to occur on a single disk thus giving you 4x the IOPs/throughput potential. See: http://www.yellow-bricks.com/2009/12/23/iops/ for numbers and possibly a better explanation Smiley Happy.

What type of Raid Controller are you using in the system (if any)? Also, does the raid controller have a battery backup? If it has a BBU have you set the write cache to Write back instead of Write through? Likely whats going on is that there is no form of write-caching enabled or some sync setting is negating the write caching, without write caching the performance (though being a little lower than I would expect) is probably caused by the disks/RAID and not OF.

Regards,
Adam

Reply
0 Kudos
ASekora
Contributor
Contributor

Hi Again Spetty,

Actually, I just noticed something which makes me change my post slightly. You're using 5900 RPM drives.

Although SATA disks are notoriously low on random IO, the lower RPM sata drives (7200 being the norm) are especially so. Whereas a 7200 drive has around 75 theoretical IOPS, a 5900 is going to have in the ball park of 50.

Taking that into consideration, performance without caching is right around where I'd expect. Even with caching I don't know if you're performance will get significantly better. These drives weren't designed for performance.

Regards,

Adam

Reply
0 Kudos
spetty7326
Contributor
Contributor

I'm surprised to hear this, actually.  It seems to be a pretty drastic performance impact to go from 75 MB/s write when the drives are operated independetly, just as a local datastore, to 2 MB/s write when they are in RAID5 in OpenFiler via iSCSI.  You would typically expect to take a 98% performance reduction with these drives going to iSCSI in RAID5?  A 7200 RPM or 10k drive would make that much difference?

I'm just skeptical that there isn't something else going on in my configuration causing this.

Reply
0 Kudos
ASekora
Contributor
Contributor

I've looked around, and found a few posts that show 4x 5900RPM drives in a RAID 5 can get similar performance to this.

https://forums.openfiler.com/viewtopic.php?pid=26078 <-- This one uses the exact same drives.

http://en.akihabaranews.com/107429/review/review-synology-diskstation-ds411-bang-for-bucks-home-nas <- Your IO pattern would be similar to the first two entries.

http://www.storagereview.com/samsung_spinpoint_ecogreen_f3_review_hd203wi <-- See Random IO performance at 4K block size (Its around 0.25-0.5MB per Drive)

Also, it very well could be a configuration issue that is causing you not to get the best performance out of this system (I never said there wasn't/couldn't be.). I asked questions in my original post to narrow that possibility in, though you skipped over them.

Few more questions:

1) When the drives were local as a local datastore what did the RAID configuration look like? How were the disks raided and passed into ESX?

2) Now that the drives are mounted through OF, how does the RAID configuration look? (Are you raiding via OF or via a hardware raid controller?)

3) What's your block size set to in OF?

Regards,
Adam S.

Reply
0 Kudos
StuartB20111014
Enthusiast
Enthusiast

Hi,

I had the very same issues. Unfortunately it's the Microserver. It just isn't up to task. Even with just 2 or 3 Vm's the CPU loading was permanently in the 3's

I swapped round so my ML115 with my single socket quad and 8 gig and my microserver became the ESXi host. The performance was night and day.

Sorry, but the only fix is better hardware really, as the neo chips don't seem up to it.

Reply
0 Kudos
spetty7326
Contributor
Contributor

After many hours of research, the problem is solved.. It was a number of configuration and performance tuning issues, some specifically related to the N40L.

For starters, the biggest change came by enabling the write caching in the BIOS.  It warned me when I did it that data could be lost if there was a power loss, but the N40L is on a robust UPS, so there was no concern.  Just changing that parameter in the BIOS made a huge difference, and returned all of the VMs to the usability they had when the drives were local.

The second biggest performance gain came from Openfiler 2.99 itself.  I changed the LUN mapping from blockio to fileio.  Here's the difference:

Blockio - Random, 4K, Queue Depth=1               Read:  0.486 MB/s          Write:   0.316 MB/s

Blockio - Random, 4K, Queue Depth=32             Read:  0.520 MB/s          Write:   0.317 MB/s

Fileio - Random, 4K, Queue Depth=1               Read:  6.700 MB/s          Write:   4.507 MB/s

Fileio - Random, 4K, Queue Depth=32             Read:  8.145 MB/s          Write:   6.193 MB/s

Obviously a huge difference.  Here is the original performance with a single drive locally in the host:

DAS - Random, 4K, Queue Depth=1               Read:  0.475 MB/s          Write:   0.311 MB/s

DAS - Random, 4K, Queue Depth=32             Read:  0.504 MB/s          Write:   0.353 MB/s

So my performance with Openfiler in RAID 5 is quite a bit better than with DAS, which is what I would have expected.

I also changed the vSwitch, vnics, and Openfiler to 9000 MTU and testing it with ping to make sure it was truly passing 9000 byte packets.  There was little, if any, performance improvement.  I also changed the LUN mapping from write-thru to write-back, but again there was little change in the results that I could see.

In the end, the little HP MicroServer N40L is performing quite well with (4) 2TB 5900 RPM SATA drives.  Even while the drives were syncing or I am throwing big operations at it, the N40L barely breaks a sweat.  I've never seen it go above 8%.

Thanks for everyone's help and suggestions.

Reply
0 Kudos
ChrisReeves
Contributor
Contributor

Wow, you helped out so much with this, I had almost the same setup and found the RAID5 speed was ~2-5MB/s.

I changed the disks to write enabled and changed the scsi to FILEIO like you said, now I at least get over 20MB/s!!!!

Thanks again,

Reply
0 Kudos