Test disk performance - is this a valid test?

hmarsili · ‎12-13-2011

I want to compare the disk troughput between two hosts. I'm doing copying 1 vmdk from one folder to another using scp (because it gave me a mb/s speed indicator).

For example:

scp opencms.tfsla.com-flat.vmdk localhost:/vmfs/volumes/4c6f0092-96100402-a2eb-001320a80e64/her

Running this on a 5 years old SATA controller with no cache and a common disk, it gives me 8mb/s. Running the same test on a RAID 5 with 256mb cache write back controller with SAS 15.6k RPM gives me the same speed.

Is this test valid? and if it is... I have a serious problem

mcowger · ‎12-13-2011

Not at all. SCP is well known to have throughput problems because it uses very small buffers and just a single set. Its probably your bottleneck.

Also, you are conflating things a bit here - do you want to test network throughput, or disk throughput? They are different.

For network throughput, I'd recommend something like iperf - that will take the disk out of the equation, as well as the boor buffer usage of SSH (along with encryption overhead).

For disk performance, I'd look at something like hdparm for a simple sequential test.

--Matt VCDX #52 blog.cowger.us

hmarsili · ‎12-14-2011

Thank you very much. I tested from one VM on host-a to another VM on host-b. The results were as follows. Is that good for 1gb connectivity?

hmarsili@www:~$ iperf -c 10.0.0.17 -f M

------------------------------------------------------------

Client connecting to 10.0.0.17, TCP port 5001

TCP window size: 0.02 MByte (default)

------------------------------------------------------------

[ 3] local 10.0.0.247 port 54196 connected with 10.0.0.17 port 5001

[ 3] 0.0-10.0 sec 1125 MBytes 112 MBytes/sec

hmarsili@www:~mce_markernbsp;

I repeat the test INSIDE the same host (vm to vm) and I get:

hmarsili@opencms11:~$ iperf -c 10.0.0.247 -f M

------------------------------------------------------------

Client connecting to 10.0.0.247, TCP port 5001

TCP window size: 0.02 MByte (default)

------------------------------------------------------------

[ 3] local 10.0.0.154 port 33892 connected with 10.0.0.247 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 2045 MBytes 204 MBytes/sec

All of this was originated because I was copying virtual machines from 1 host datastore to another. Transfer were about 10mb/s. Suddenly, I started to see 200kbps. My first assumption was a network problem, but now, I'm thinking on source harddrive (an old sata)

zying · ‎12-14-2011

The test result looks good for 1 gb connectivity. 112 MBytes/sec nearly saturated the link.

mcowger · ‎12-14-2011

Agreed - with IP overhead, 112MB/sec is pretty much perfect for 1GbE

--Matt VCDX #52 blog.cowger.us

tetrapackage · ‎12-14-2011

Agreed. For disk performance test, reaching the 112 MBytes/sec will definitely good enough.

rickardnobel · ‎12-15-2011

tetrapackage wrote:
For disk performance test, reaching the 112 MBytes/sec will definitely good enough.

With the iperf tool this was only pure networking, so there is no disk activity tested and this might still be a bottleneck.

My VMware blog: www.rickardnobel.se

timjwatts · ‎12-16-2011

The way I tested disk throughput was to make a Linux VM, clone it a few times, then in each VM, run:

time dd if=/dev/zero of=/var/tmp/testfile bs=10M count=100

simultaneously.

You could test with 3-5 VMs on one host or perhaps 3 VMs in each of your hosts at once to see how the load spreads.

With 3 hosts and iSCSI to an EqualLogic PS6500E 48x1TB SATA 7200RPM SAN I got a max of about 200MB/sec cumulative total.

A more interesting test is to use the same linux VMs and run:

fio --filename=/dev/dm-0 --direct=1 --rw=randwrite --bs=4k --numjobs=64 --runtime=300 --group_reporting --name=test1

and then

fio --filename=/dev/dm-0 --direct=1 --rw=randread --bs=4k --numjobs=64 --runtime=300 --group_reporting --name=test1

/dev/dm-0 whould be replaces with either an LVM device that is not in use (!!) or a second blank VM Hard Disk device you've added or possibly a file, eg /var/tmp/testfile

The first fio test will give you a max IOPS rating for random writes, the second will do the same for random reads. all using 4k blocks and running 64 tests in parallel on that one VM.

Run this simultaneously on several VMs on all hosts if you want to stress the disks.

For reference, the PS6500E gives around 3200 IOPS for a few VMs doing this for randomwrites and rather less for reads, whcih makes sense as writs are dumped into the SAN and/or disk caches but reads require the disk to actually seek and get the data. The direct=1 arg disables VM OS caching.

In reality the PS6500E with 90 VMs can peak at around 4000 mixed read/write IOPS under heavy load.

If you want to just measure throughput from one host to another host involvig disks, I would run Veeam FastSCP on a Windows VM and do a host-host transfer as not much does it quicker in an ESXi environment as the hosts only have dropbear SSH installed which is pretty rubbish at doing SCP transfers.

HTH

Tim