Hi all.
Since a couple of days we opened a thread here:
Unfortunately, no one answered us here in case of our performance questions. So I want to try it here looking forward for some people which have same environments or problems. I can sum up the problems shortly:
Environment:
HP Enclosure 3000c with 3 HP BL460c
MSA 2012i connected via GbE2c Switch and software iSCSI (no activated Mezzanine cards in the moment!)
3 VMWare ESX 2.5 Update 2 (Standard) with ~ 15 Windows 2003 Servers (no database servers, 1 small exchange server, 2 additional linux servers with average i/o) each
Seperate iSCSI segment (10.1.10.0 / 10.1.11.0) with Service Console port
Problems we have:
Slow disk I/O while no performance log (MSA and ESX Host) shows peaks
disk queue very high while performing eg. cloning on hosts
logon to windows often very very slow (takes up to 10 minutes!) while no peaks are registered in logs (!?)
Who has experience in similar environments and can give me some tips? It would also be very helpful to get some information about performance indicators in our environment and how to get them. Could there be any settings or logs on the switch, which could be interesting for us?
Thanks.
Thomas
I think your main problem is your switch. ISCSI best practice, is to use separate switch, one for lan and one for san. GBE2c is not the best switch for iSCSI. I recommend you the Cisco 3120. We have bought Equallogic Box and I list you some recommandation from equallogic:
-The only siwtch for C3000 and C7000 is recommanded by EQL is, the cisco 3120 or the HP 10G switch.
-Enable flow controle
-Important : Disable unicast strom control and spanning tree ( host side)
-Not mandatory : enable Jumbo frame
Hi.
- Flow control is enabled
- Unicast storm control is not configurable on this switch(?)
- Spanning tree is disabled
what else can we do?
Is your iSCSI lan separate from your prod lan ? If not, use vlan for separating iSCSI traffic.
Hi.
Thank you!
Now, we separated LAN / iSCSI via VLAN. Performance is quiet better, but I think, there could be more :smileyblush:
So, what about Jumbo Frames in our environment? MSA does support Jumbo Frames, VMware sais "no" to iSCSI and Jumbo Frames. Are there any experiences in that here?
Bye Thomas
Jumbo frame is used for reduce CPU overhead for big file transfert, but it is useless for if you have many small files transfert. If you want to use Jumbo frame, all the network must be activated.
Do you use ESX iscsi initiator or MS initiator within the vm ?
I reviewed all, you cannot activate Jumbo frame, because you are using lan and iscsi traffic on the same switch. I think you main problem is your siwtch and you are using the same siwtch for iscsi and lan.
Do you use ESX iscsi initiator or MS initiator within the vm ?
ESX iscsi initiator, because mezzanine cards are not connected
I reviewed all, you cannot activate Jumbo frame, because you are using lan and iscsi traffic on the same switch. I think you main problem is your siwtch and you are using the same siwtch for iscsi and lan.
Ok, we will check buying another switch...
Thx
Hi.
Now, a couple of days later, we have done many things, but still no success
We bought another Blade Switch for iSCSI only
on vmware and MSA, we enabled Jumbo frames
VLAN on switch enabled
Flow control disabled / enabled
More CPUs in guest, more RAM in guest...
We just compared our performance with another iSCSI environment based on an Windows 2003 Storage Server :smileyalert: and we have the following results:
Test scenario
Windows 2003 Server guest
1024 MB RAM
2 virtual CPUs
4 GB unaligned and standard formatted iSCSI partition
Iometer Transfer Size = 4 MB
Iometer %Access = 100% (67% read, 33% write)
Iometer runs 5 minutes
in our environment we reach a maximum of 3.781.838 IOps (with all tuning tips (Jumbo frame...) we reached about 5 mio. IOps) and in our referenced environment 13.697.642 (!!!) IOps.
What the hell we are doing wrong? The reference environment is out of the box, no one changes such things like jumbo frames or flow control or so. My test is also showing, that these parameters fasten up the things only marginal.
We do have another, big bottleneck in our environment, but which one? ?:|
Bye Thomas
This is a interesting point I did not told you:
On the MSA, we do have a RAID 5 with 3 750GB SATA(!) hard disks playing. MSA does not get performance problems, when we believe the performance monitors of HP. But when I see this under realistic eyes... RAID5 with large SATA disks and about 20 servers?! ?:|
We will now create another array with SAS disks to test this.
3 disks is really not enought for 20 servers. What swicth have you bought and now how many nics your ESX have connected on this switch? How many controlers do you have and how many nics ports are you using on your msa ?
First I would suggest to run the tests from this thread: http://communities.vmware.com/thread/73745
and trying to start with simple config (one vmkernel port/one console port for it, one lun configured through only one sp w/o any redundancy). Just my thought.
In the Storage Configuration Settings i have added the dynamic discovery on 10.1.10.100, 10.1.11.100 and 10.1.4.160.
If i remove the 10.1.4.160 discovery, the ESX Server will not find any iSCSI Targets. It should find the 10.1.11.100 IP at least...
No if your mpath is configured incorrectly. Have you present your iscsi lun over the both controller ports? Therefore try first the simle way, I think.
3 disks is really not enought for 20 servers. What swicth have you bought and now how many nics your ESX have connected on this switch? How many controlers do you have and how many nics ports are you using on your msa ?
Ok, we ordered some more disks to enlarge capacity and performance. Is SATA a good choice there?
We bought the switch. ESX hosts are now connected via Mezzanine card (1 Port) to the new switch. MSA was bought with single controller, 2 ports are in use there (10.1.10.100 / 10.1.11.100).
First I would suggest to run the tests from this thread: http://communities.vmware.com/thread/73745
and trying to start with simple config (one vmkernel port/one console port for it, one lun configured through only one sp w/o any redundancy). Just my thought.
Would be Ok, if... we are in production because of believing HP pre sales support in our plans X-(. I only can isolate one host via VMOTION and do things on it then. But I will post you results of your test thread above...
No if your mpath is configured incorrectly. Have you present your iscsi lun over the both controller ports? Therefore try first the simle way, I think
This problem is already solved and working fine now (see above)
Bye Thomas
Try using MS initiator within the vm to see if any difference. Start with one nic in the vm and after try with 2 nics with mpio. SATA can be good if it well planned. We have purchased 2 Equalogic PS 50000 with SATA disk. To have enough performance, we need to buy 2 units for 32 disks. The gbe2 is not best switch for iscsi. You can try to connect a laptop on that switch and configure MS initiator on the laptop, you will see if your switch is good enough.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
TABLE OF RESULTS
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
SERVER TYPE: VM ON ESX 3.5
CPU TYPE / NUMBER: VCPU / 2
HOST TYPE: HP BL460C, 12GB RAM; 1x XEON E5320(Quad), 1,86 GHz,
STORAGE TYPE / DISK NUMBER / RAID LEVEL: MSA1210i x 1 / 3 SATA / R5
SAN TYPE / HBAs : iSCSI
##################################################################################
TEST NAME--
##################################################################################
Max Throughput-100%Read........__35______..........___1669___.........___52____
RealLife-60%Rand-65%Read......___514_____..........___109___.........____0.8____
Max Throughput-50%Read..........____23____..........___2429___.........___75____
Random-8k-70%Read.................____345____..........___158___.........____1____
EXCEPTIONS: VCPU Util. 20-40%;
High performant write results, don´t you think so?? :smileygrin:
It's really poor but maybe all those caused by sata disks. Have you enabled your write/read caching on your storage?
With 3 satas I would expect 400-500 ios/sec and ~100 ms response time (in RealLIfe test).
It's really poor but maybe all those caused by sata disks. Have you enabled your write/read caching on your storage?
yes