VMware Cloud Community
BillClark22
Contributor
Contributor

Poor Performance with Equallogic iSCSI and VMware

I have a small, 2 host, VMware ESX 3.5 setup with an Equallogic PS100e iSCSI SAN and I'm seeing terrible performance with any SQL database running on a VM. I have 20 VM's that are mostly application servers that are doing fine, but any SQL server has poor performance. What we've done is re-create a database system running SQL server 2008 on physical hardware, as a VM and compared the same reports against the same data. While I expected to see some delays in the VM side, we are seeing reports taking 3-4 times as long on the VM as opposed to the physical. Of course that doesn't seem right so I'm starting to do some digging to see where the issue is. Here's my setup:

x2 ESX 3.5.0(153875) hosts, HP DL380-G5, dual Quad-core Xeon 2.00GHz (E5335) processors, 32GB RAM, x4 GB NICs, x1 Qlogic iSCSI HBA QLA-4052C (dual ports)

x2 HP ProCurve 2810-24G switches, 3 VLAN's(iSCSI, vMotion, default traffic), Jumbo frames (mtu=9000) turned on for iSCSI VLAN, Flow Control turned off for all ports

x1 Equallogic PS100e iSCSI SAN, 14 WDC WD2500YS SATA drives, 2 volumes, MTU Size=9000, x3 1GB ethernet ports

According to SANHQ, here are some perfomance counters: (Counter, Today1, Today2, 1Day, 7Day)

Average I/O - 859.8KB 2.4MB 499KB 7.4MB

Average IOPS - 65.1 223.5 47.4 54.0

Average Latency - 2.3ms 12.3ms 2.6ms 2.6ms

Average I/O Size - 13.2KB 10.9KB 10.5KB 11.1KB

Read/Write % - 5.1/94.9 2.6/97.4 3.6/96.4 3.9/96.1

Est. I/O Load - Low Low Low Low

As I'm not sure what range those values should be, I'm not sure if the SAN is working at full speed or not. Whether it is the bottleneck or maybe the switches are. I've sent SAN diagnostic files to Dell and they say all appears to be working normally, but they suggest turning Jumbo Frames OFF everywhere, and turn Flow Control back on. I haven't done that yet as a test, but wanted to get some feedback as to the values I'm seeing from the SAN and possibly what experiences ppl have had with the Equallogic iSCSI SANs and HP Procurve switches in a virtual environment. Any help/advice/guidance is GREATLY appreciated! Thanks.

Bill

Reply
0 Kudos
5 Replies
BillClark22
Contributor
Contributor

Seriously.....no ideas, comments, opinions, etc...??

Reply
0 Kudos
sketchy00
Hot Shot
Hot Shot

I have a few more questions about your setup, (primarily your arrangement of physical NIC's to your hosts, SAN, etc.) But are those SQL DB and LOG volumes stored as VMDK's on a shared VMFS volume? How many VM's do you have per VMFS volumes? Have you experimented with making the locations where the SQL DB and LOG volumes as native volumes residing on the SAN, then using the guest iscsi initiator (via the Equallogic HITKit), and connecting that way? ...Using that method (my personal preference) allows for multipathing (which is not possible with VMFS volumes in ESX 3.5), and it allows you to use the ASM/ME application from Equallogic, which will make a fully VSS aware snapshot of the database and logs.

Reply
0 Kudos
terrellj
Contributor
Contributor

Those SATA drives aren't going to be great for heavy write performance. If you aren't running the array in RAID 50, you should. RAID 5 will be a dog. VM volumes should be in the 300-500 GB range. Separate the data volume from the log volume. I would turn on jumbo frames and flow control.

Reply
0 Kudos
jackshu
Enthusiast
Enthusiast

This post is almost 2 months old so I don't know if you found a solution yet, but we experienced similarly bad performance.

Our setup consists of:

4 HP DL380 G5 Hosts each with (dual quad core 3.0ghz, 32gb ram, 4GB NIC)

1 HP DL380 G6 host with dual quad core 2.93ghz, 64gb ram, 8GB NIC

2 PS5000E each with 16 1TB SATA drives in RAID 50 in a group

1 HP Procurve 2800 switch

Around 40 VM's running. various application servers, sql database, file servers, web servers etc.

Initially we were having problems moving the VM's from the servers internal storage to the san. the process would take a long time then timeout (a 40gb file would take a whole day to copy before it times out).

After spending days troubleshooting and opening a case with VMWare, we were about to give up when our case was escalated to tier 2 support. the tier two guy immediately asked what switch we were using. He told us that some HP procurve switches have problems with both Jumbo frames and flow controll enabled at the same time. we took a look at the switch logs, as well as the network port logs on the PS5000E and sure enough, lots of packet re-transmits logged.

He said it was more important for the EQL to have flow controll then it is to have jumbo packets, so we disabled jumbo packets on the ESX servers, as well as the switch. and immediately we saw a huge performance difference.

a benchmark we ran on the system before turning jumbo frames off showed we were getting only a few hundred kilobytes per second throughput from the ESX host to the SAN. after we turned off jumbo frames we consistently get 200+ MB/sec throughput on systems with 2 NICS for iscsi, and 270+MB/sec on systems with 3 or more NICs for iscsi.

We do have a PS4000E connected to a lower end HP Procurve switch at a remote site with two HP Dl380 G5 hosts that have no problems with jumbo frames and flow controll enabled at the same time. I don't remember the model of the switch, but I know it wasn't the 2800 series.

If you are still having slow performance, try turning off jumbo frames.

Reply
0 Kudos
sketchy00
Hot Shot
Hot Shot

Good point Jack. The original poster should look into that. I've just recently ran into some odd behavior, all because of those pesky jumbo frames and flow control. We had a couple of trunked Dell Powerconnect 5424's that were just fine with my EQ PS SAN. When I changed them over to a couple of stacked Dell Powerconnect 6224's, I was getting reports by SANHQ of high TCP retransmit issues. Jumbos, flow control, and LLDP all may have been contributing factors. The powerconnects also appeart to not like using jumbo frames on the default vlan. Anyway, I recreated my vswitches that connect to my SAN on my ESX hosts to work with standard frames, and am awaiting to see if the TCP restransmits goes down. Not sure, but it looks encouraging so far.

Reply
0 Kudos