Dear colleagues,
I'd like to start a discussion on a very important and general topic - the disk throughput rate on any of the ESX versions. The reason to do that is a design I'm working on including implementation of virtual file servers. What I bumped into during the test phase was very poor performance. What we tested - copying 1GB file from the virtual machine disks, and issueing time dd command with block size 1024 and count 1000000 from the console of the ESX. There are the results:
ESX 4 on BL460 – 20MB/s
SAN attached, 40MB/s local SAS disk. (SAN attachedis EVA)
ESX 4 on BL685c G5 –
15MB/s SAN attached, 20MB/s local SAS disks
From Virtual machine –
copying files on VMFS - 10MB/s
ESX 3.5 on BL685c G6 –
30MB/s on SAN and 20MB/s on local disks (SAN attached is Clariion)
ESX 4 on BL685c G6 –
30MB/s on SAN, 20MB/s on local SAS disks
From the virtual
machine copying 1GB file on VMFS datastore – about
25MB/s
From the virtual
machine copying 1GB file on RAW device – about
25MB/s
The numbers are not any
different when using LSI, BUS logic or Paravirtual bus. They also didn’t change
when the queue length extended to 128.
ESX 3.5 running on Dell
T300 test lab – about 25MB/s on local and iSCSI disks
I installed a clean Win
2003 enterprise server on a blade BL685c G6 on it’s local disks. Then presented
a disk from the same array and tested the performance – it was close to 500MB/s
…
I now start to wonder - is this typical for the ESX ? I've done many implementations so far but always with a dedicated VM team and never received complains. This time however I planned for file, application and DB servers on Virtual.. And now it seems like not such a good idea.
I would appreciate if you share some thoughts and maybe execute similar tests in your environments and paste the results here.
Thank you in advance
Just to update the topic:
I extended the tests – I presented 4 disks from 2 arrays. Each one owned
by different controller. Then I presented those 4 disks to a virtual server and
created a stripped volume. Theoretically even though the ESX uses one fabric
only, it should have 4 times the performance spreading the I/Os to 4 LUNs. The
performance however is still the same. It doesn’t make a
difference.The ESX is is terrible in disk throughput.
Your testing methodology is flawed. You need to use something like IO Meter (http://www.iometer.org/) to get repeatable results, plus everyone who provides results needs to you use the same 'settings' in IO Meter to even compare results, else your just randomly comparing results. Your test is just testing the write speed of your disk subsystem. Results will vary depending on raid type and amount of spindles assigned etc.
Your just looking at the write throughput, not the IOPS or read throughput of the disk subsystem. There are many things at play that can adjust the outcome of results, even down to disk alignment.
So to start comparing results, you need to create a standard test that will mix read/write throughouts etc.
Well, I already used the IOmeter - the results are not any different. I'm concerned about the end user experience, and the tests I do are purely about that.
And this is exactly my point - in any other environment changing and tuning the settings have effect. Not with ESX. I tried dedicated disk groups, mixed ones etc. I measured the performance from the array perspective - the arrays latency is below 10msec. And the number of the I/Os are way below the standards for these arrays. There is nothing else running on them. Also - the problems I describe happens on the local disks too.
So, I'm really interested in results from - time dd on the ESX, and test copy a file in the virtual machine. Why - because the performance I'm looking for is 10 times the current. Difference of 10 - 20 MB/s won't do any good. So, the question is more like a principal - I don't need accuracy. I just need to know whether someone is able to achieve 200 - 300MB/s from a virtual machine no matter on what kind of storage.
Try using this IOmeter template
http://www.mez.co.uk/OpenPerformanceTest.icf
and compare your results to others in the storage performance thread:
http://communities.vmware.com/thread/73745
http://www.GabesVirtualWorld.com
I couldn't find a write performance data there. Only read numbers.
50% read means also 50% write
70% read is also 30% write
etc
http://www.GabesVirtualWorld.com
Helo ANDRO...
Very please to meet someone in the same boat ! I'm trying to understand exactly as you are. Are you still on line ? I would like share our experience.
SUBJECT: POOR PERFORMANCE ABOUT I/O SAN DISK WITH ESX 4
If i'm comparing my performance with IO meter with all benches posted in this communities i'm quite good !!!
But i'm agree 200 % with ANDRO... Performance of ESX compared with a physical server are very bad. The 1 Go file transfer test from local ESX to SAN sounds so good to me than using IOmeter because it's real life ...
2.5 minutes for ESX and 15 seconds from a physical server to write a 3 Go file to the SAN: that's a difference
It's not to be admitted, according to me.
Pls find attached my result bench
Regards to everybody
Dell M610, HP EVA SAN
dd if=largefile2 of=largefile1
1156656960 bytes (1.2 GB) copied, 38.6686 s, 29.9 MB/s
Dell M610, Local Disk
dd if=largefile2 of=largefile1
1156656960 bytes (1.2 GB) copied, 28.6824 s, 40.3 MB/s
Pretty bad imho. Anyone still reading this thread?
In my experience[/url] the disk io inside a VM is not very different from physical boxes, especially if you're having a random IO workload. For sequential io (large file copying, backup, etc), there is a penalty by using VMs, but this penalty is less if you're using the paravirtual[/url] scsi driver.
For good service performance, the random ios are normally the important ones to watch out for. And the more VMs you have running against the same storage, the more random the IOs to that storage will be.
Lars
First, I agree that your testing (dd) is flawed. You should not based VM performance expectations on ESX console tests. Clearly, you wont be running IOmeter from the ESX console either, and the "icf" file for IOmeter is intended to be used by a VM from within your vSphere cluster (or ESX host). I have the following snapshot on hand from a recent test run on an SAN appliance in a VMware deployment. The initial write ramp-up is from a vCenter-based ISO image upload (from NAS/NFS Datastore - on the same SAN volume - to vCenter VM "local disk" on VMFS on the same SAN volume.) At over 40MB/s, it's... The remainder of the graph shows the two IOmeter runs using the standard "icf" parameters used in this site's VMware performance comparisons. (Note, an error in the graph baselines "0" at about the 10,000 KBps mark - VMware?)
The tests are against an iSCSI connected (1Gbps connection) array of 6-SAS 15K disks (2x 3-disk RAIDz groups - similar to RAID50). In 50% tests (32K block, 50% read, 50% write) the throughput drops to 38MB (19MB read, 19MB write - simultaneous access). These aren't DAS speeds, and latency on 1Gbps Ethernet for iSCSI is a contributor, but its about what you would calculate for "IOPs x blocksize" based on the limited number of spindles and RAID configurations given...Write caching would have made this better, but the array was not configured for it...
--Collin C. MacMillan
SOLORI - Solution Oriented, LLC
If you find this information useful, please award points for "correct" or "helpful".