VMware Cloud Community
christianZ
Champion
Champion

New !! Open unofficial storage performance thread

Hello everybody,

the old thread seems to be sooooo looooong - therefore I decided (after a discussion with our moderator oreeh - thanks Oliver -) to start a new thread here.

Oliver will make a few links between the old and the new one and then he will close the old thread.

Thanks for joining in.

Reg

Christian

574 Replies
christianZ
Champion
Champion

Hi,

the "Real Life 60% Rand - 65% Read" looks to good to me - have here 2 MD3220 / SAS with 24 10k disks and saw ca. 3500 iops on R5/11 disks (maybe I try to configure a R1 set with 8/10 disks and test also)

Should the 15k disks make such a difference? Have you repeated your tests?

Reg

Christian

P.S. And yes its true - each controller has 2GB cache included (and a flash card with 8GB).

Reply
0 Kudos
makruger
Contributor
Contributor

@ Christian,

These whitebox results I recently posted were obtained using the perftest.iso file available from http://vmktree.org/iometer/. While I have not performed a diff, their script looks to be the same as yours. So, for the Real Life result of iops=12870, I believe the size would have been 8192KB.

I am quite suprised to see these kinds of numbers myself. But I have run this test many times with the same results. All I can say is Openfiler 2.99 File I/O with WB, using dual Realtek 8168B NICS and ESXi 5.0 MPIO performs very well. Nearly 2 gigs of memory buffer sure helps. FWIW.... I could only get 150MB reads using Intel NICS and the IOPS were quite a bit less too.

I have tried several NAS products (NexentaStor, Open-E, FreeNAS), and while they all have their merits, their ISCSI File I/O performance cannot match Openfiler. They don't even come close. Attached are results from the best run I've achieved:

untitled.PNG

Whitebox NAS and whitebox ESXi 5.0.

Reply
0 Kudos
alexxdavid
Contributor
Contributor

Hi Christian

I don't think that the 15krpm is doing much difderence except on the latency side as i get nearly the same results on the 10krpm ones, also i have repeated the tests and am getting the same results. Tests are done on 2 different servers as well with nearly the same specs both with 4x nics reserved for iscsi and the results are the same.

Also the tests are being done while another 10 servers are running, 5 on each server. The last test was done an a running exchange 2010 server but at this time of the day there is no load at all as the company doesn't operate on weekends.

Oh and btw , my iops setting is set to 1, would it be better to have it set to 3 for the MD or 1 Shouldn't cause any problems in the long run?

Regards

David

Sent from my iPhone

Reply
0 Kudos
christianZ
Champion
Champion

@makruger

Thanks for the feed back. If your testfile is only ca. 8000KB - that means ca. 8MB then all the ios comes from cache (or ram in your case).

That is the cause of the high iosps and very low latency I think.

Reg

Christian

Reply
0 Kudos
christianZ
Champion
Champion

@Alexxdavid

Thanks for sharing your experiences here.

>Oh and btw , my iops setting is set to 1

Do you mean 1 minute run time here?

If so then the time is to short - you should test min. 5 minutes or longer.

Check the size of the testfile. It should be min. 4GB big.

Well the 15k disks should make a difference but not so high. If your testfile is to small and the run time only 1 min, then the most of the ios come from controller cache and therefore you can't see any differences between 10k and 15k disks, I think.

Reg

Christian

Reply
0 Kudos
alexxdavid
Contributor
Contributor

Hi Christian

Seems i got my wording wrong,

Iops : 1 is a feature in vmware that sends each iops to a different path instead of the default value of each 1000 iops.

Test was run for 5 min and testfile is 4gb.

David

Sent from my iPhone

Reply
0 Kudos
christianZ
Champion
Champion

Ok, that's  clear now for me. You are using RR with the setting 1 io per path (changed from the default 1000).

I wonder only about any differences between 10k and 15k disks. I try to test it also.

Reg

Christian

Reply
0 Kudos
alexxdavid
Contributor
Contributor

Did you try ? What are your results?

Sent from my iPhone

Reply
0 Kudos
Henriwithani
Contributor
Contributor

At least with EFDs changing the iops to 5-10 instead of 1 gives a bit more IOps

http://henriwithani.files.wordpress.com/2011/11/104iops.png

@henriwithani

-Henri Twitter: http://twitter.com/henriwithani Blog: http://henriwithani.wordpress.com/
Reply
0 Kudos
christianZ
Champion
Champion

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

TABLE OF RESULTS

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

SERVER TYPE: VM ON ESXi 5, Win 2003, 512GB RAM, test file on C:\

CPU TYPE / NUMBER: VCPU / 1

HOST TYPE: Dell PE715, 64GB RAM; 2x AMD Opteron 6220(8C), 3 GHz,

STORAGE TYPE / DISK NUMBER / RAID LEVEL: Dell MD3220 SAS direct / 22+2 SAS 10k / R10-10 Disks

LD segment size 512kB, dynamic cache prefatch/read deactivated

SAN TYPE / HBAs : SAS direct attached /  Dell 6Gb SAS HBA

##################################################################################

TEST NAME- Win 2003 on ESXi 5, Raid 10, MD3220/SAS


                                              Av. Resp. Time ms           Av. IOs/sec          Av. MB/sec

##################################################################################

Max Throughput-100%Read........____0,6______..........___36874___.........___1150____

RealLife-60%Rand-65%Read......___14,4/15_____.......__3226/3191_.......___25____

(test file size 4GB/8GB )

Max Throughput-50%Read..........____4,4____.........._____13448___.........___420____

Random-8k-70%Read.................____13,5____.........._____3120___.........____24____

EXCEPTIONS: VCPU Util.  98-47-44-54 %

So I decided to go back to R5 (the difference is to small to run R10 IMHO); the test will follow

Reply
0 Kudos
JaFF
Contributor
Contributor

Hi,

I am currently out of the office.

If you require assistance, please call our helpdesk on 1300101112.

Alternatively, email service@anittel.com.au

Regards,

James Ackerly

Reply
0 Kudos
Ingens
Contributor
Contributor

Hello everyone,

I'd like to ask you about some storage performance issues we are facing. We are using Oracle (old SUN) Unified Storage 7110 and have 5 ESXi 4.1 hosts connected to one LUN (which got all the space assigned).
Past few weeks we had HUGE performance problems and finally figured that it was due to using over 80% of the storage available. Apparently a drop in performance while using RAIDZ system is to be expected when using over 80% of its total capacity. So we tried to lower the storage usage below the 80% mark and achieved a performance increase. However, it was still way lower than expected and thus, we were not yet satisfied. Talking with Oracle support they pointed us to upgrade to the newest firmware release which solved some bugs related to performance issues that we were most likely facing.
But here comes the weird part, after upgrading the firmware we have noticed a HUGE performance increase on our benchmarks but the VMs, while faster, weren't still working as smoothly as few months ago.
The only explanation I can come up with is that we are not properly understanding the benchmarks results or that there is something else that we are missing. So here we are asking you guys about our performance results:
SERVER TYPE: HP Proliant DL120G6
CPU TYPE / NUMBER: Xeon X3430
HOST TYPE: ESXi 4.1
STORAGE TYPE / DISK NUMBER / RAID LEVEL: SUN Unified Storage 7110 / 14 / RAID6+1 (RAIDZ)
|*TEST NAME*|*Avg Resp. Time ms*|*Avg IOs/sec*|*Avg MB/sec*|*% cpu load*|
|*Max Throughput-100%Read*|11.39|5212|162|19%|
|*RealLife-60%Rand-65%Read*|1.82|2313|18|12%|
|*Max Throughput-50%Read*|2.16|1496|46|10%|
|*Random-8k-70%Read*|1.13|2354|18|12%|
Reply
0 Kudos
davidbewernick
Contributor
Contributor

Hi Inges,

did the number ov VMs increased in the last weeks?

You storage might be ok, but having just one big LUN is mostly not really a good idea. 1 LUN means jsut 1 SCSI stream which can quickly become a bottleneck. Are you seeing any disk wait times? Best way to check them is with ESXTOP / RESXTOP

http://www.yellow-bricks.com/esxtop/

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100820...

------------------------------------

This table lists the relevant columns and a brief description of these values:

ColumnDescription
CMDS/sThis is the number of IOPS (Input/Output Operations Per Second) being sent to or coming from the device or virtual machine being monitored
DAVG/cmdThis is the average response time in milliseconds per command being sent to the device
KAVG/cmdThis is the amount of time the command spends in the VMkernel
GAVG/cmdThis is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG
These columns are for both reads and writes, whereas xAVG/rd is for reads and xAVG/wr is for writes. The combined value of these columns is the best way to monitor performance, but high read or write response time it may indicate that the read or write cache is disabled on the array. All arrays perform differently, howeverDAVG/cmd, KAVG/cmd, and GAVG/cmd should not exceed more than 10 milliseconds (ms) for sustained periods of time.
------------------------------------

Also ESX4.1 is in some points not as good as 5 when it comes to SCSI handling. If you copy a lot of files to the system so that you have to allocate new blocks, you will always have a short scsi reservation from on host and the rest has to wait.

So try to get the Latency Statistics

  • Kernel Average / command (KAVG/cmd)
  • Device Average / command (DAVG/cmd)
  • Guest Average / command (GAVG/cmd)

and Queuing Information

  • Adapter Queue Length (AQLEN)
  • LUN Queue Length (LQLEN)
  • VMKernel (QUED)
  • Active Queue (ACTV)
  • %Used (%USD = ACTV/LQLEN)

for further investigation.

Regards,

David

Reply
0 Kudos
davidbewernick
Contributor
Contributor

Oh, and can you give some more details about the 7110?

FC, iSCSI, 1,4 or 8gb?

Dedublication on?

Snapshots used?

And don´t forget to check if you might have a LUN misalignment issue...

Reply
0 Kudos
_VR_
Contributor
Contributor

A CALL FOR HELP

I've spent a week trying to troubleshoot an issue with a new Equallogic PS4100X. A case has been opened with Dell a week ago. After multiple escalations it has gotten absolutely nowhere. I wanted to see if anyone would be able to add some insight.

IOMeter test result:

SERVER TYPE: Windows 2008 R2
HOST TYPE: DL380 G7, 72GB RAM; 2x XEON E5649 2.53 GHz 6-Core
SAN Type: Equallogic PS4100X / Disks: 600GB 10k SAS / RAID LEVEL: Raid50 / 22 Disks / iSCSI
##################################################################################
TEST NAME--Av. Resp. Time ms--Av. IOs/sek---Av. MB/sek----
##################################################################################
Max Throughput-100%Read.......______18___..........___3217__........___101____
RealLife-60%Rand-65%Read..._____13___.........._____3438__........_____27____
Max Throughput-50%Read.........______19___..........____3199__........___100____
Random-8k-70%Read................_____13___.........._____3463__........_____27____

DESCRIPTION OF PROBLEM:

The PS4100X has a system bottleneck that limits throughput to 100MB/s. When a single host is connected with a single path, eth0 and eth1 on the PS4100x can max out at 1Gbit/s. When there are multiple hosts or multiple paths connected (tested 2 - 8 concurrent paths, 2-6 host nics), the throughput of eth0 and eth1 drop to half of the speed (500Mbit/s). The combined throughput of both ethernet adapters can never exceed 1Gbit/s. Unit has been upgraded to v5.2.1 (latest) firmware.

SEE TEST RESULTS HERE:

1. Shows eth1 being maxed out in single path, then the connection switches to multipath
2. Shows eth0 being maxed out in single path, then the connection switches to multipath
3. Shows two concurrent tests from two separate test hosts

RULLING OUT NETWORK ISSUES:

I'm able to replicate the above problem in the following configurations:
Test host connected to PS4100X via Cisco 6509
Test host connected to PS4100X directly via cross over cable (two active iscsi paths setup manually)
Test host connected to PS4100X via dedicated unmanaged netgear switch
I can further prove that the Cisco 6509 is functioning properly because I'm able to show speeds of 180MB/s+ speeds to the production PS6000XV and the production PS4000E.

RULLING OUT HOST ISSUES:

Tested from a host running Windows 2008 R2 and another host running Windows 2003. Both test hosts encounter the issue described above. Both hosts show speeds of 180MB/s+ when running tests against the two Equallogics in production.

DEALING WITH DELL-EQUALLOGIC SUPPORT HELL:

The analyst I'm currently dealing with says the PS4100x is working as expected. He refuses to do any further troubleshooting because some of the blades on the Cisco 6509 have QOS and VOIP. The blade the SAN and test hosts are connected to have no QOS or VOIP configured.
Reply
0 Kudos
davidbewernick
Contributor
Contributor

Hi _VR_,

I had to deal with a EqualLogic Model with 4x1Gb per controller a while ago. When I did this setup it was important to do a right network setup and RoundRobin in ESX.

So can you confirm the following:

- seperated vSwitches for ever NIC used for iSCSI?

- Jumbo Frames enabled everywhere?

- RoundRobin used for path selection?

- Testes run on seperated volumes? (-> iSCSI reservations etc...)

- Disabled: TCP and IP Offload engines on NICs

As far as I know, the Controllers are active/passive. So you said there is a limitation of 100MB in the controller head. When you use 1Gbit/s this ends in about 80-100 MB/s you can use. So I´m not really getting the problem here?

Did you see http://www.equallogic.com/WorkArea/DownloadAsset.aspx?id=8453 ?

Regards,

David

Reply
0 Kudos
_VR_
Contributor
Contributor

Thanks for the reply

- seperated vSwitches for ever NIC used for iSCSI?

yes, i have two vSwitches. each one has 1 iSCSI nic. i also ran a test from a physical host (non-esx) with 4 iSCSI nics. same results

- Jumbo Frames enabled everywhere?

i tried turning jumbo frames on. max throughput test runs 5% faster while the reallife test runs 5% slower

- RoundRobin used for path selection?

same results in round robin and least queue depth

- Testes run on seperated volumes? (-> iSCSI reservations etc...)

seperate volumes & separate hosts concurently

- Disabled: TCP and IP Offload engines on NICs

disabling / enabling offload made no difference

As far as I know, the Controllers are active/passive. So you said there is a limitation of 100MB in the controller head. When you use 1Gbit/s this ends in about 80-100 MB/s you can use. So I´m not really getting the problem here?

The PS4100X has 2 active nics per controller. The expected throughput is 200MB/s (2000/8=250MB/s Theoretical). I see each NIC pushing 100MB/s one at a time. When they're both active the throughput per nic drops to 50MB/s.

Reply
0 Kudos
s1xth
VMware Employee
VMware Employee

First....might want to open a separate thread on this, for continued troubleshooting, we try to keep this thread dedicated for storage performance posts.

What type of switches are you using with your EQL setup?

What type of NICs on the ESX hosts? Broadcom or Intel?

Thanks,

Jonathan

Sent from my iPad.

http://www.virtualizationimpact.com http://www.handsonvirtualization.com Twitter: @jfranconi
Reply
0 Kudos
_VR_
Contributor
Contributor

My apologies, moving thread to http://communities.vmware.com/thread/393583

Reply
0 Kudos
Tomek24VMWARE
Contributor
Contributor

My result on Broadcom 5709 NIC without jumboFrame - NOT SOFTWARE iscsi only dependend HBA

SERVER TYPE: VM Windows 2008 R2 62bit
CPU TYPE Intel Xeon X5680 / NUMBER: 6 core
HOST TYPE: ESXi 5.0 patch3 Dell R710 and Supermicro + 4xNIC braodcom 5709 offloading without JumboFrame
STORAGE TYPE HP P2000 G3 4x10Gb/s / DISK NUMBER: 6x600GB SAS2 15k / RAID LEVEL: RAID10 from 6x600GB SAS2 15k iSCSI

|*TEST NAME*|*Avg Resp. Time ms*|*Avg IOs/sec*|*Avg MB/sec*|*% cpu load*|
|*Max Throughput-100%Read*|6.57|8844|276|0%|
|*RealLife-60%Rand-65%Read*|8.41|4296|33|1%|
|*Max Throughput-50%Read*|7.20|8125|253|0%|
|*Random-8k-70%Read*|8.04|4353|34|1%|

Reply
0 Kudos