VMware Cloud Community
Buck76
Contributor
Contributor

Disk queue / iSCSI SAN (MSA 2012i) / Blade Center

Hi all.

Since a couple of days we opened a thread here:

Unfortunately, no one answered us here in case of our performance questions. So I want to try it here looking forward for some people which have same environments or problems. I can sum up the problems shortly:

Environment:

  • HP Enclosure 3000c with 3 HP BL460c

  • MSA 2012i connected via GbE2c Switch and software iSCSI (no activated Mezzanine cards in the moment!)

  • 3 VMWare ESX 2.5 Update 2 (Standard) with ~ 15 Windows 2003 Servers (no database servers, 1 small exchange server, 2 additional linux servers with average i/o) each

  • Seperate iSCSI segment (10.1.10.0 / 10.1.11.0) with Service Console port

Problems we have:

  • Slow disk I/O while no performance log (MSA and ESX Host) shows peaks

  • disk queue very high while performing eg. cloning on hosts

  • logon to windows often very very slow (takes up to 10 minutes!) while no peaks are registered in logs (!?)

Who has experience in similar environments and can give me some tips? It would also be very helpful to get some information about performance indicators in our environment and how to get them. Could there be any settings or logs on the switch, which could be interesting for us?

Thanks.

Thomas

0 Kudos
19 Replies
chrisfmss
Enthusiast
Enthusiast

I think your main problem is your switch. ISCSI best practice, is to use separate switch, one for lan and one for san. GBE2c is not the best switch for iSCSI. I recommend you the Cisco 3120. We have bought Equallogic Box and I list you some recommandation from equallogic:

-The only siwtch for C3000 and C7000 is recommanded by EQL is, the cisco 3120 or the HP 10G switch.

-Enable flow controle

-Important : Disable unicast strom control and spanning tree ( host side)

-Not mandatory : enable Jumbo frame

Buck76
Contributor
Contributor

Hi.

- Flow control is enabled

- Unicast storm control is not configurable on this switch(?)

- Spanning tree is disabled

what else can we do?

0 Kudos
chrisfmss
Enthusiast
Enthusiast

Is your iSCSI lan separate from your prod lan ? If not, use vlan for separating iSCSI traffic.

Buck76
Contributor
Contributor

Hi.

Thank you!

Now, we separated LAN / iSCSI via VLAN. Performance is quiet better, but I think, there could be more :smileyblush:

So, what about Jumbo Frames in our environment? MSA does support Jumbo Frames, VMware sais "no" to iSCSI and Jumbo Frames. Are there any experiences in that here?

Bye Thomas

0 Kudos
chrisfmss
Enthusiast
Enthusiast

Jumbo frame is used for reduce CPU overhead for big file transfert, but it is useless for if you have many small files transfert. If you want to use Jumbo frame, all the network must be activated.

0 Kudos
chrisfmss
Enthusiast
Enthusiast

Do you use ESX iscsi initiator or MS initiator within the vm ?

0 Kudos
chrisfmss
Enthusiast
Enthusiast

I reviewed all, you cannot activate Jumbo frame, because you are using lan and iscsi traffic on the same switch. I think you main problem is your siwtch and you are using the same siwtch for iscsi and lan.

0 Kudos
Buck76
Contributor
Contributor

Do you use ESX iscsi initiator or MS initiator within the vm ?

ESX iscsi initiator, because mezzanine cards are not connected

I reviewed all, you cannot activate Jumbo frame, because you are using lan and iscsi traffic on the same switch. I think you main problem is your siwtch and you are using the same siwtch for iscsi and lan.

Ok, we will check buying another switch...

Thx

0 Kudos
Buck76
Contributor
Contributor

Hi.

Now, a couple of days later, we have done many things, but still no success

  • We bought another Blade Switch for iSCSI only

  • on vmware and MSA, we enabled Jumbo frames

  • VLAN on switch enabled

  • Flow control disabled / enabled

  • More CPUs in guest, more RAM in guest...

We just compared our performance with another iSCSI environment based on an Windows 2003 Storage Server :smileyalert: and we have the following results:

Test scenario

  • Windows 2003 Server guest

  • 1024 MB RAM

  • 2 virtual CPUs

  • 4 GB unaligned and standard formatted iSCSI partition

  • Iometer Transfer Size = 4 MB

  • Iometer %Access = 100% (67% read, 33% write)

  • Iometer runs 5 minutes

in our environment we reach a maximum of 3.781.838 IOps (with all tuning tips (Jumbo frame...) we reached about 5 mio. IOps) and in our referenced environment 13.697.642 (!!!) IOps.

What the hell we are doing wrong? The reference environment is out of the box, no one changes such things like jumbo frames or flow control or so. My test is also showing, that these parameters fasten up the things only marginal.

We do have another, big bottleneck in our environment, but which one? ?:|

Bye Thomas

0 Kudos
Buck76
Contributor
Contributor

This is a interesting point I did not told you:

On the MSA, we do have a RAID 5 with 3 750GB SATA(!) hard disks playing. MSA does not get performance problems, when we believe the performance monitors of HP. But when I see this under realistic eyes... RAID5 with large SATA disks and about 20 servers?! ?:|

We will now create another array with SAS disks to test this.

0 Kudos
chrisfmss
Enthusiast
Enthusiast

3 disks is really not enought for 20 servers. What swicth have you bought and now how many nics your ESX have connected on this switch? How many controlers do you have and how many nics ports are you using on your msa ?

0 Kudos
christianZ
Champion
Champion

First I would suggest to run the tests from this thread: http://communities.vmware.com/thread/73745

and trying to start with simple config (one vmkernel port/one console port for it, one lun configured through only one sp w/o any redundancy). Just my thought.

0 Kudos
christianZ
Champion
Champion

In the Storage Configuration Settings i have added the dynamic discovery on 10.1.10.100, 10.1.11.100 and 10.1.4.160.

If i remove the 10.1.4.160 discovery, the ESX Server will not find any iSCSI Targets. It should find the 10.1.11.100 IP at least...

No if your mpath is configured incorrectly. Have you present your iscsi lun over the both controller ports? Therefore try first the simle way, I think.

0 Kudos
Buck76
Contributor
Contributor

3 disks is really not enought for 20 servers. What swicth have you bought and now how many nics your ESX have connected on this switch? How many controlers do you have and how many nics ports are you using on your msa ?

Ok, we ordered some more disks to enlarge capacity and performance. Is SATA a good choice there?

We bought the switch. ESX hosts are now connected via Mezzanine card (1 Port) to the new switch. MSA was bought with single controller, 2 ports are in use there (10.1.10.100 / 10.1.11.100).

First I would suggest to run the tests from this thread: http://communities.vmware.com/thread/73745

and trying to start with simple config (one vmkernel port/one console port for it, one lun configured through only one sp w/o any redundancy). Just my thought.

Would be Ok, if... we are in production because of believing HP pre sales support in our plans X-(. I only can isolate one host via VMOTION and do things on it then. But I will post you results of your test thread above...

No if your mpath is configured incorrectly. Have you present your iscsi lun over the both controller ports? Therefore try first the simle way, I think

This problem is already solved and working fine now (see above)

Bye Thomas

0 Kudos
chrisfmss
Enthusiast
Enthusiast

Try using MS initiator within the vm to see if any difference. Start with one nic in the vm and after try with 2 nics with mpio. SATA can be good if it well planned. We have purchased 2 Equalogic PS 50000 with SATA disk. To have enough performance, we need to buy 2 units for 32 disks. The gbe2 is not best switch for iscsi. You can try to connect a laptop on that switch and configure MS initiator on the laptop, you will see if your switch is good enough.

0 Kudos
Buck76
Contributor
Contributor

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

TABLE OF RESULTS

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

SERVER TYPE: VM ON ESX 3.5

CPU TYPE / NUMBER: VCPU / 2

HOST TYPE: HP BL460C, 12GB RAM; 1x XEON E5320(Quad), 1,86 GHz,

STORAGE TYPE / DISK NUMBER / RAID LEVEL: MSA1210i x 1 / 3 SATA / R5

SAN TYPE / HBAs : iSCSI

##################################################################################

TEST NAME--


Av. Resp. Time ms--Av. IOs/sek---Av. MB/sek----

##################################################################################

Max Throughput-100%Read........__35______..........___1669___.........___52____

RealLife-60%Rand-65%Read......___514_____..........___109___.........____0.8____

Max Throughput-50%Read..........____23____..........___2429___.........___75____

Random-8k-70%Read.................____345____..........___158___.........____1____

EXCEPTIONS: VCPU Util. 20-40%;

High performant write results, don´t you think so?? :smileygrin:

0 Kudos
christianZ
Champion
Champion

It's really poor but maybe all those caused by sata disks. Have you enabled your write/read caching on your storage?

0 Kudos
christianZ
Champion
Champion

With 3 satas I would expect 400-500 ios/sec and ~100 ms response time (in RealLIfe test).

0 Kudos
Buck76
Contributor
Contributor

It's really poor but maybe all those caused by sata disks. Have you enabled your write/read caching on your storage?

yes

0 Kudos