Iometer is an open source tool originally developed by Intel that remains the simplest and best means of generating load on a system for performance analysis. Because of minute fluctuations in the in-guest timers many in-guest benchmarks are unable to produce accurate results. We at VMware have used Iometer for years and find its results to be accurate in most situations. But see the disclaimer below for a word of caution.
Figure 1. Iometer with disk targets tab selected.
Figure 1 shows the topology and disk target tab for a particular VM. As the manager only has one host visible beneath it and only one worker is present on another host, you can tell that this is a single server with only one processor. Had Iometer been started on a 4-way box, four workers would be visible. As the UI remains operational, dynamo workers can be connected from other hosts to drive load across multiple VMs that may be on multiple servers.
For all Iometer tests, under "Disk Targets" always increase the "# of Outstanding I/Os" per target. When left at the default value of ‘1', a relative low load will be placed on the array. By increasing this number some the OS will queue up multiple requests and really saturate the storage. The ideal number of outstanding IOs can be determined by running the test multiple times and increasing this number all the while. At some point IOPS will stop increasing. Generally an increase in return diminishes around 16 IOs/target but certainly more than 32 IOs/target will have no value due to the default queue depth in ESX. See Storage Queues and Performance for more information on queues. In Figure 1 you can see that "# of Outstanding I/Os defaults to 1."
When choosing the system to test in the topology frame, the "Disk Targets" tab will provide options as to the storage target. The options here include formatted disks (yellow) or unformatted disks (blue). In the former case Iometer address the storage through the OS's file system (FS). In the latter, direct calls are made to the hardware without using a FS. Storage specialists are usually more interested in just the hardware so evaluation of unformatted LUNs (blue) is preferable. There is some cost of virtualizing the OS's interface to the disk through the FS so formatting the disk with the correct FS and testing the yellow target can be helpful. Figure 1 shows two testable drives: the yellow-iconed C drive on which the OS was installed and the blue-iconed unformatted drive that is preferable for benchmarking.
Always make certain that the "maximum disk size" in the "Disk Targets" tab is larger then the available memory! For instance, when testing a formatted disk, setting the maximum size to 200,000 sectors (or 100 MB) could be cached by the guest OSes in a VM provided 1 GB of RAM. In this case all Iometer calls to storage will be intercepted and cached by the guest, host, or storage cache. Setting the disk maximum disk size to a number at least four times greater than the memory available in the largest cache will avoid caching.
Under the "Access Specifications" tab, choose a workload that matches the most interesting profile. Real workloads that are dominated by database performance often randomly read and write small, fixed-size block IO. SQL Server on Windows, for instance, uses 16K blocks, 66% read (which implies 34% write), and 100% random (thus 0% sequential). Exchange 2007 uses a similar profile but with an 8K block size. Oracle on Linux has flexibility to use the block size set when the file system was created. Depending on the DB specialists needs, this can range from 2K to 64K but will again be random with a 2:1 read-to-write ratio. Note: you can approximate this Linux performance on a Windows guest but do not run Iometer on Linux (see "Iometer On Linux" below.)
Application | Block Size | Randomness | Read/write Ratio |
Exchange 2003 | 4K | 80% | 60% read (40% write) |
Exchange 2007 | 8K | 80% | 55% read (45% write) |
SQL Server | 16K, 64K | 100% | 66% read (34% write) |
Table 1. Example Iometer profiles.
Note the number of workers that has been specified in under the manager. This will default to one worker (thread) for each physical or virtual processor on the system. In the event that Iometer is being used to compare native to virtual performance, make sure that the worker numbers match! For instance, the work count will be one on a UP VM but four for the same native measurements if the system is quad-core. Correct the native worker count by detaching workers.
Before invoking the test, be aware of the potential impacts of data alignment. VMware has demonstrated substantial differences in performance based on alignment of data on storage arrays in our partition alignment paper. Make sure that partitions and virtual disks have been created by Virtual Center to guarantee that the partitions and files are properly aligned.
As discussed in Time-based Measurements in Virtual Machines, the hypervisor may introduce some inaccuracy to in-guest time measurement. The likelihood of measurement error increases as the server's load increases. Generally servers that are using less than 30% of their available CPU resources can be trusted. In the event that a large VM on a small server is driving all CPUs to high utilization, Iometer results may suffer from some inaccuracy. This is rarely the case with Iometer runs.
As mentioned above, the results provided by Iometer tend to be trustworthy. But for unimpeachable results use the analysis techniques provide by VMware. See the Performance Monitoring and Analysis that's already been dedicated to this subject.
Left me on a cliffhanger with: do not run Iometer on Linux (see "Iometer On Linux" below.)
Why not run iometer in Linux?
Iometer does not use the correct, asynchronous libraries on Linux. So, no matter what you put in the "outstanding IOs" text box, it only generates one IO at a time.
Scott
More information on my blog and on Twitter:
this article is great and exactly what I was looking for. could u explain how is it done and how it should be configured? it seems that the dynamos run internally inside the VMs, so how could they produce accurate results? (especially when you say "many in-guest benchmarks are unable to produce accurate results.")
thanks
L.
Excellent question, L. Anything that measures within the guest is suspect. But measurements inside the guest are not guaranteed to be bad. VMware engineering found over years of testing that some benchmarks produced results that were suspect and others produced reasonable results. These benchmark's differences are seen in what operations they measure, what timers they use, how large the timed operations are, and other factors.
VMware confirmed that Iometer was one of the good benchmarks. Others produced suspicious results.
As an addendum to this: the time measurement problem has become less and less of an issue over time. vSphere has gotten better and hardware support has limited the impact of skewed time.
Good post. There is another good Iometer tutorial (howto) and it includes an embedded video for visual learners like myself. This link should give you a few more details if you are still thirsty:
http://www.itechstorm.com/iometer-tutorial-introduction
I love seeing scenario specific comparisons with Iometer.
Just a quick update: VMware has a fling called IO-Analyzer that uses IOmeter for Linux under the covers. The engineers have made specific changes to its operation to support changing the outstandingio's parameter. This tool will get you around this issue.
Do not forget to adjust “Align I/Os on”, in case of RAW disk testing to appropriate vlue of your storage system. For example for NetApp FAS, “Align I/Os” must be set to 4KB.