Help me understand performance differences in stor...

JRink · ‎09-20-2010

I am trying to figure this out...

Our disk system is an HP MSA 2312i (iSCSI) G2 SAN. 11 disks in a RAID 5 array. We have 3 LUNS on that physical array.

I was doing some testing with IOMETER and can't seem to figure something out...

When I run IOMETER on VM1 (Windows 2008 x64 machine using the LSI SAS controller), the results for RealLife-60%Rand-65%Read and Random-8k-70%Read are very good (1400 IOPS or so). Exactly what I'd expect from my SAN.

However, when I run IOMETER on VM2 (Windows 2008 x32 using LSI Parallel controller) the results for RealLife-60%Rand-65%Read and Random-8k-70%Read are NOT very good (500 IOPS or so). I had thought, perhaps the difference was the disk controller the VM was using, so I changed this VM to the LSI SAS controller (like VM1 is using) but the performance didn't improve at all.

I also tried running IOMETER from VM3 (Windows 2003 x32 using LSI Parallel controller) and the results for RealLife-60%Rand-65%Read and Random-8k-70%Read are NOT very good either (500 IOPS or so).

I'm confused... Since all 3 VMs are running on the SAME physical disks. Yet only VM1 shows the results I like. I also did these tests while all other VMs were shutdown so normal disk IO wouldn't interfere with testing results.

Is x64 known to provide better results? If that isn't it, what else could be attributing to this?

Input/ideas welcome, thanks.

wila · ‎09-20-2010

Hello,

Since you have 3 different LUNs, I am wondering if VM1 is on another LUN as the other two VMs?

If that's the case, then check on the physical setup how that the write back caching is setup for the performant LUN versus the other LUNs.

I'm sorry I'm not familiar with the SAN itself, but that's where I suspect the big difference is.

Hope this helps,

--
Wil
_____________________________________________________
VI-Toolkit & scripts wiki at http://www.vi-toolkit.com

Contributing author at blog www.planetvm.net

Twitter: @wilva

| Author of Vimalin. The virtual machine Backup app for VMware Fusion, VMware Workstation and Player |
| More info at vimalin.com | Twitter @wilva

JRink · ‎09-20-2010

Hello.

When I did the testing, it did not seem to matter which LUN the VM was running from during the tests. VM1 and VM3 werre on the sam LUN, yet yielded different results. I will verify this to be sure.

Besides, caching should not affect performance at all during IOMETER tests because the filsize is 4GB.. ??

EDIT -

I seem to also recall that my x64 machine that provides good results does NOT have vmtools installed.. I need to doube check that to be sure.

Regards

J

FredPeterson · ‎09-20-2010

We would need more detail as to the LUN make up.

Are you experiencing hot spots for example. Are you LUNs sized such that its unlikely a partial disk would be used. When you do this testing, is IO activity across the board zero except for your test VM?

JRink · ‎09-20-2010

Disk activity should be near NIL on the LUN during testing (apart from the VM doing the testing). I did this off-hours and was able to shutdown the production VMs.

I'm not sure I follow what you mean about hot spots. The disk system is configured as follows: 1 single RAID 5 array (11 physical disks). That array then is split up into 3 Logical Disks (each approximately 1TB in space). LUN1 in ESX is assigned to Logical Disk 1, LUN2 in ESX is assigned to Logical Disk 2, etc. Each LUN does NOT access a specific group of disks....

Thanks for the input.

peterdabr · ‎09-20-2010

"11 disks in a RAID 5 array"

Does it mean you have a single raid5 physical lun that is further devided into logical LUNs which in turn get presented to the hosts? If that's the case, your VM read/writes actually happen on the same physical LUN, whether given VM resides on one logical lun or another.

Please elaborate more on array setup.

In ideal scenario, you would want to test both VMs yielding different results on the same physical LUN with virtually no other traffic and possibly use another I/O generator for comparion (another free software that comes to my mind is ATTO disk benchmark, although it can only do sequential read/writes if I'm not mistaken)

Additionally, there are other things that you should take into account when comparing results, in your case it would be: proper disk alignement (or rather lack thereof in Windows 2003 vs W2008, see: http://www.tcpdump.com/kb/os/windows/disk-alignment/into.html) and the fact that vmdk file for better I/O-performing VM could reside on the outer layers of disk plate could contribute to the overal I/O difference.

JRink · ‎09-20-2010

To clarify.

There is a single array (RAID 5) on the SAN, which contains 11 physical disks (3TB total).

The RAID 5 array is split into 3 Virtual Disks, each 1TB in Capacity, vd01, vd02, and vd03 (while HP's SAN calls them Virtual Disks, sometimes they are referred to a Logical Disks within the Array by other vendors I believe).

vd01 is seen by ESX as LUN1, vd02 is seen by ESX as LUN, and vd03 is seen by ESX as LUN3.

Furthermore, one thing I didn't mention... While the differences in the Random-8k-70%Read and RealLife-60%Rand-65%Read tests is substantial (500 IOPS on the good VM versus 1400 IOPS bad VM), the SAME VMs used for testing show IDENTICAL results (4000+ IOPS regardless of which VM I try) for the Max Throughput-100%Read and Max Throughput-50%Read tests. Does that make any sense at all? LOL.

I will review your link on the disk alignment...

EDIT -- that is a very interesting link... the x32 Windows 2008 VM I was using for testing WAS upgraded from 2003, which would still exhibit the 2 sector issue disccussed in the article. This gives me something to try!! thanks.

J

ewitte · ‎09-20-2010

someone's living life dangerously

JRink · ‎09-23-2010

Peterdabr,

Good call. I did some additional testing. I created a brand new Windows 2008 VM and did the IOMETER tests on it. The real world (random reads) tests peformed MUCH better than my older 2003 or 2008 (that were upgraded from 2003) VMs. I even took that newly craeted VM and moved it between all 3 LUNS and the results proved the same. Proper disk alignment in a 2008 VM built from scratch yields nearly 2.5x the performance for random disk access (1400 IOPS compared to 550 IOPS or so). Strangely enough, MAX throughput tests didn't seem to be affected by the alignment issues and had good results regardless.

In summary, I guess 2003 VMs and 2008 VMs that were upgraded from 2003 do not perform as good as new installations of 2008 when it comes to real life (random access) tests in IOMETER. From the links you gave me, upgrading a VM from 2003 to 2008 will NOT FIX the alignment issue. It's only fixed on NEW installations. Going forward, I am going to have to make sure that when I compared IOMETER tests from one SAN to another, I am using the SAME os version to accurately compare results.

Thank you for the tip!!!! Kudos.

J

All

Help me understand performance differences in storage?