Re: Expected I/O Throughput?

jgodau · ‎10-12-2010

Hello All,

does anyone have any "benchmark" figures for I/O throughput using local storage?

We're running a RHEL 5 VM on a HP ProLiant DL360 G6 with a pair of 300GB SAS 6G disks in a RAID 1 mirror, Smart Array Storage Controller p410i and getting what I find is low throughput particularly on reads.

What kind of IO figures do people get with their machines? What is normal for this configuration?

Cheers

Jack...

akshunj · ‎10-12-2010

What are you using to baseline your storage performance? IOMeter?

PacketRacer · ‎10-12-2010

You are not specifying the speed of the drives. Are they 10K or 15K?

Sorry, I don't have benchmarks, but I can tell you that you won't get much out of two local drives. My rule of thumb goes like this: for a 15K RPM drive, assume that you will get 150 IOPS; for a 10K RPM drive, assume 100 IOPS. So if you have two 15K drives in a RAID1 config, in theory you should get something close to 300 IOPS on reads, and about 150 IOPS on writes. The throughput will depend a lot on the size of the IOPS and how the RAID controller manages the IO requests. In general, I would not expect any more than 40 MB/sec on sequential reads. With VMware's random read/write pattern, you'll be lucky if you get anything close to 20 MB/sec.

This is purely out of my head from stuff I've seen over the past 3-4 years. Take it with a grain of salt, and keep in mind your milage may vary.

For a more scientific way of calculating your performance, Google the model number of your drive and get the performance specs. Then read this article on how to calculate the theorical performance: .

P.S. Also, when you read that the transfer rate of a drive is 120 MB/sec, don't assume that you will get this. Those numbers are calculated for an ideal situation where you start reading from one end of the drive to the other. You will never get anything close to that with random IO requests of varying sizes.

jgodau · ‎10-12-2010

Hello All,

the drives we're running are:

300.0 GB 10K - HP EG0300FAWHV with the latest Firmware HDDE

We've updated all of the firmware, BIOS, ILO, etc... that was not at the latest level, this has brought us a small performance gain, but unfortunately it still does not match the performance we're getting using the same VM on older lower powered hardware.

We're using the Oracle tool Orion and hdparm to measure performance.

From within a RHEL 5 VM (hdparm v6.6) on the G6

hdparm -t /dev/sda1 produces an average of approximately 90.0 MB/s

hdparm -t --direct /dev/sda1 produces an average of approximately 88.8 MB/s

The Orion tool gives averages of IOPS 225 and MBPS 45.5

On the new ESX 3.5 update 5 server itself (hdparm v5.4)

hdparm -t /dev/sda2 produces an average of approximately 116.7 MB/s

This version of hdparm doesn't have a direct option

On the older lower powered machine (DL360 G5, Smart Array p400i running SAS 3g disks also RAID1) we get

From within a RHEL 5 VM (hdparm v6.6)

hdparm -t /dev/sda1 produces an average of approximately 70.0 MB/s

hdparm -t --direct /dev/sda1 produces an average of approximately 74.7 MB/s

The Orion tool gives averages of IOPS 560 and MBPS 66.3

The values we're getting out of the Orion tool are most important to us, as we plan to run an Oracle DB on this server, particularly the much much lower IOPS we're seeing there is causing us to be concerned. Performance tests of an actual oracle application on this new server are also significantly slower than on the old machine.

Thoughts? Anything else we can test?

Cheers

Jack...

akshunj · ‎10-13-2010

I have never used the Orion tool. If you download IOMeter and run the tests on the new and old servers I might be able to comment. You can download IOMeter @ www.iometer.org for free.

jgodau · ‎10-13-2010

Hello akshunj,

I've never used IOMeter - can you tell me what a good command to run would be?

Our major bottleneck seems to be with reading via an Oracle database.

Regards

Jack...

akshunj · ‎10-13-2010

Well a good feature of IOMeter is the ability to simulate different scenarios such as workload, payload, and connections per transaction.

I would suggest:

Start IOMeter

Under topology you will see "all managers."

Highlight your host and select your drive(s) in the "disk targets" tab.

Under the "Access Specifications" tab look for the Global Access Specification: "32K, 100%Read, 0% Random" and click "add" to assign this test to your host.

Once you have done this, click the green flag to kick off the test and watch the "Results Display" tab. Set the update result slider to 3 or 4 seconds. Let the test run for 5 minutes or so.

You can compare your results to the what you see in the following thread, there is a lot of good info there. You may even see someone with the same hardware as your setup:

http://communities.vmware.com/thread/73745

Also useful:

http://blogs.vmware.com/performance/2008/05/100000-io-opera.html

PacketRacer · ‎10-13-2010

Well, to be honest you are fighting a losing battle by using just two local spindles to run an Oracle database.

That said, I see your problem like this: with the old virtual machine you were getting 560 IOPS and 66.3 MB/sec throughput. With the new one you are getting 225 IOPS and 45.5 MB/sec throughput.

Running hdparm from inside the VM seems to indicate that your disks should perform better - for sequential reads anyway (hdparm -t is a sequential read test).

Here's some things to check:

1) Start with the hardware: How much cache does the old Smart Array controller have? 512 or 256MB? How much cache does the new one have? Does the new contoller have a battery backup (the BBWC option)? The battery is incredibly important for write performance. Also boot into the BIOS and check the HW Prefecth and Adjacent Sector Prefetch settings - make sure they are enabled (it's under Advanced Options). I would also recommend reading the ESX performance guide and checking the rest of the settings in that Advanced Options screen - you should disable the Intel C states for example, if you want to squeeze every ounce of performance out of the box.

2) Once you've compared the BIOS settings then move on to the virtual machine settings. Compare memory and CPU allocation. Compare the VMFS version. Make sure the VMDK settings are the same (thin vs thick). Check the IO shares and IOPS limits.

3) Move on to the OS - check your mount opions and file system parameters (see the tune2fs -l command). File system block size and journal options can make a difference. OS available memory makes a huge difference here, too, because Linux will give all freely available memory to the buffer cache. You should also check kernel parameters - at very least compare the /etc/sysctl.conf files on the old and new servers. For a more details read this paper if you have time:

4) Finally, make sure you are giving Orion the correct parameters. Remember that it simulates a running Oracle database, so the parameters you give it make a huge difference.

Hope that helps!

jgodau · ‎10-14-2010

Hi PacketRacer,

thanks for the information - it's a great starting point.

I understand that this is not an ideal setup for an Oracle database, its not for production use but to test performance when we make changes to our software. Even though the performance is not as good as it could be using other configurations, the performance is isolated from the rest of our network so only changes that we make to our software will affect it - that's the plan and it worked well on the old hardware. We bought new hardware so that we could scale up the tests, but running the old tests on the new hardware produced terrible results - so we need to work out why that is before we move on.

Note that it's READ I/O where the performance is particularly bad, writing (out of our regular performance tests) seems to be similar to the old hardware.

1) Hardware - Old machine had 256MB Battery backed cache, the new machine had 256 MB without battery (ordering mistake), we've just upgraded this to 1GB cache with battery. No significant change to the results at 25% read 75% write. I've just set it to 75% Read 25 % Write with no significant changes to results either. Reading still slow, writing same as the old hardware.

BIOS settings and those of the Smart Array Controller have been checked.

2) The VM we're using to test performance has been cloned from the old hardware onto the new hardware, so it's settings are identical on both servers. Same memory and CPU allocation, no limits set, the server is not running out of resources either, no VM is taking more than its allocated share and there is still spare CPU (less than 50% peak usage) and memory 2GB from 32GB totally free.

3) Do you mean the OS inside the VM or the ESX Installation itself? The OS inside the VM all settings are the same (clone) and for ESX itself both are a standard install of ESX 3.5 update 5.

I'd love to read the paper, but there was no link?

4) orion command we're using: ./orion_linux_x86-64 -run simple -testname sdd -num_disks 1

We're running the same command on both machines, so it should give similar output (even if it is not the ideal test command to use) - is there a better command that I could use (one that does read only tests).

My big concern here is that new hardware that is supposed to have better performance (more memory, faster CPU, faster disks, more cache) is actually performing worse under an Oracle load.

Once we've nailed down what is causing this performance hit on the new machine, then we'll do some more tuning based on your suggestions to try and get even more performance out of it. But first we need to find out why read times on this new hardware are so much slower than on the old hardware.

Thanks for your input though. Any further help is appreciated.

Jack...

jgodau · ‎10-14-2010

Hello akshunj,

finally I've worked it out - seems strange to have to install the program on two machines, but none the less it is running now.

IOMeter results

old server

-

Total IOs per Second 2103.65

Total MB per Second 65.74

Max IO Response time (ms) 57.0037

new server

-

Total IOs per Second 2946.65

Total MB per Second 92.08

Ave IO Response time (ms) 0.3384

Max IO Response time (ms) 84.5312

These results agree with those from the hdparm tests I ran earlier.

They do not explain why or how the orion tests are so much worse.

The orion tool specifically measures performance in the way that the Oracle DB uses IO, we have also seen poor performance from an actual Oracle DB in terms of READ I/O.

Cheers

Jack...

jgodau · ‎10-14-2010

Orion Stats

OLTP - lots of small IO reading (test max IO operations)

DSS - lots of big IO reading (test max MB throughput)

server	OLTP	DSS
new g6 ESX 3.5	280 io/s	48.16 mb/s
old g5 ESX 3.5	942 io/s	74.70 mb/s

akshunj · ‎10-14-2010

I think the IOMeter results speak for themselves, no? Since you seem to be seeing different results between IOMeter and Orion, run both tests again and watch esxtop to see which test is accurate.

RParker · ‎10-14-2010

What is normal for this configuration?

You mentioned drives, machine hardware, cache on the controller and RAID.

But block size wasn't mentioned. So maybe that's the difference. What is the old server block size (disk alignment could be an issue with only 2 disks) vs the new server?

Not the VM, that didn't change, but the actual hardware RAID block size...

jgodau · ‎10-15-2010

Thanks for the replies.

I will try to monitor esxtop while the tests run next, but i assume both the results will be correct, the method of testing is just different in orion to iometer and hdparm, orion is important for us because it simulates the DB.

The RAID Block size is 128k on the new server, I need to wait for a maintenance window so I can reboot the old server and check the values - I'll get back to you on that early next week.

Thanks for the help guys!

Jack...

akshunj · ‎10-18-2010

Did you ever have a chance to check the block size on the new server?

jgodau · ‎10-20-2010

Yes - just managed to check today.

Block size on both machines is 128k

HP Technician was here, swapped out the motherboard (and built in controller), he checked all of the BIOS and Controller settings too and found them to be correct. Unfortunately there was no change to results

Update: As a test we've installed ESX 4.1.0 now and the results are pretty amazing see the newest Orion Stats

OLTP - lots of small IO reading (test max IO operations)

DSS - lots of big IO reading (test max MB throughput)

server	OLTP	DSS
new g6 ESX 4.1	1177 io/s	157.86 mb/s
new g6 ESX 3.5	280 io/s	48.16 mb/s
old g5 ESX 3.5	942 io/s	74.70 mb/s

Seems to be a problem with ESX 3.5 update 5 and the new hardware.

Just need to convince the boss to pay for ESX 4.1 licenses now.

Jack...