A CALL FOR HELP
which Switches do you use for iSCSI in your setup ?
I asked Dell about the IOPS Threshold, and they told me that it is not supported. Do you have a official answer from VMWare or Dell regarding that setting?
This thread is VERY interesting. I surprised Equallogic was not able to get you guy fixed...
re: IOPS value. Dell/EQL, HP, EMC, VMware, etc agreed on some general principles about iSCSI configuration. One of the was to change the IOs per path from 1000 to 3. If you google "multivendor iscsi post" you will find it on several blogs.
One thing about Delayed ACK, is that you have to verify that the change took place. If you just change it, it appears that only new LUNs will inherit the value. (disable) I find that, while in maint mode, removing the discovery address and any discovered targets (in static discovery tab), then disabling Delayed ACK and re-add in Discovery Address and rescan resolves this.
At the ESX console run: #vmkiscsid --dump-db | grep Delayed All the entries should end with ='0' for disabled.
I just posted this to another thread on EQL perf issues:
Common causes of performance issues that generate that alert are:
1.) Delayed ACK is enabled.
2.) Large Recieve Offload (LRO) is enabled
3.) MPIO pathing is set to FIXED
4.) MPIO is set to VMware Round Robin but the IOs per path is left at default of 1000. Should be 3.
5.) VMs with more than one VMDK (or RDM) are sharing one Virtual SCSI adapter. Each VM can have up to four Virtual SCSI adapters.
6.) iSCSI switch not configured correctly or not designed for iSCSI SAN use.
This thread has some specifc instructions on how to disable DelayedACK as well.
Re: Storage Heartbeat VMK port. (Mentioned in EQL iSCSI config guides for ESX)
This will never affect performance. The lowest VMK port in each IP subnet is the default device for that subnet. With iSCSI MPIO when VMK ports are on the same subnet this can cause problems when the link associated with that VMK port goes down. (cable pull, rebooting/failed/powered off switch, etc) ESX will still use that port to reply to ICMP and Jumbo Frame SYN packets. During the iSCSI login process, the EQL array pings the source port from the array port it wants to use to handle that session request. If that ping fails the login process fails. The Storage Heartbeat makes sure that default VMK port is always able to respond to the ping. That VMK port must NOT be bound to the iSCSI adapter.
i had to update the nic driver for the broadcoms on our 620's to get it decent, after the firmware update to 5.2.4h1, latency went to god high numbers (20k+) and until the drivers, would just stay there nice and high. horrible horrible performance
we are a 100% virt shop so it was killing us, needless to say l1 support was less then useful.
have you resolved the issue about flowcontrol?
I met the same issue.. my cisco ISO is 12.2(58)se2 cisco 2960s switch
Switch# show flowcontrol interface gigabitEthernet g1/0/21
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------- -------- -------- -------- -------- ------- -------
Gi1/0/21 Unsupp. Unsupp. desired off 0 0
the port 21 is connected with EQL, but the "oper" under receive flowcontrol is always "OFF".
Re: Storage Heartbeat VMK port. (Mentioned in EQL iSCSI config guides for ESX)
This will never affect performance.
Saying it will "never" affect performance is incorrect. Without storage heartbeat, performance still can be affected. If storage heartbeat is not setup and a NIC (with the lowest vmk mapping) fails, then "the Equallogic will not be able to accurately determine connectivity during the login process, and therefore suboptimal placement of iSCSI sessions will occur". If a storage heartbeat vmk is setup, then it gets tied to both pNIC's and so if a NIC fails, it uses the 2nd NIC to determine optimal placement of iSCSI sessions. So the heartbeat vmk in itself won't direct affect performance, but to say it will never affect performance is incorrect
Sorry that I wasn't clear.
Under normal conditions the SHB VMK port has no impact. Since no iSCSI traffic passes through it. However, a network outage isn't what I would consider a "performance" problem. Since w/o SHB you can end up with a total outage, not just degraded performance. I've seen it happen. That's what I was referring to.
FYI: This issue is resolved in ESXi v5.1. So you no longer need the SHB VMK port.
Sorry to as the dumb questions but I just want to clarify what the problem is.
You have an array with two 1GB links connected on the active controller, your hosts have multiple 1GB ( how many?) links connected to the switching.
You are seeing a 100MB/s limit on IOMeter running within a single VM.
You say for the array "The combined throughput of both ethernet adapters can never exceed 1Gbit/s"
So for a single 1 gigabit link the maximum theoritical throughput is 108 megabytes per second.
If you have two separate VM's running on two separate hosts running I/O meter you are seeing 50MB/s throughput on each?
I would very much like to know what settings you used to get your Equallogic speed up. Also, what tool do you use to get your performance numbers. We are using Equallogic storage with vsphere 5.1 and MEM configured.
I am out of the office until Monday 4/22 . If your request is urgent, please resend your message with URGENT in the subject and I will get back to you as quickly as possible.
I know its been a while but I found this while trying to diagnose a fault of our own.
PS4100X (24x600Gb SAS), 2 x Cisco 3560X (12.2 (55), 3 x HP DL360p (2 x QP Broadcom 5719 NICS), VMWare 5.1 U2 (HP custom image).
We see 50MB/s in total from the Equallogic regardless of how many hosts are used and how many NICs, if we test from two hosts the speed halves, if we test from two NICS we see 25MB/s on each NIC or 50MB/s if one NIC is tested.,
I followed advice from this post as well as following all the usual best practice guides.
I logged a call with Dell who wanted Diags and SANHQ output, they said the SAN was fine. Logged a ticket with VMWare and they looked at DAVG in ESXTOP and found that this was showing values of 100-1000 during IO operations so VMWare pointed me at the switches as they said it wasn't VMWare and Dell had already said it wasn't the SAN.
I also found millions of oversize packets on iSCSI ports, its almost as if the Equallogic isn't correctly discovering path MTU and the Cisco was having to fragment packets before transmitting them to the hosts, maybe inducing the lag.
Checked all the switch config, checked flow control was operational on the ports set to desired (as per this post), thanks
Eventually I went to the Datacentre and disconnected all the cables to the SAN, plugged in a laptop and ran IOMeter and did some basic file copies between SAN volumes and again could only get 50MB/s.
I got Dell on a remote session via the wireless on the laptop so they could see it directly and they've agreed to send out a specialist engineer Monday morning so we'll see what happens next. They're also sending some spares but as yet I don't know what they are.
This whole process started on the Monday and it took me until Friday to convince Dell of the problem.
I'll update this post when we know more in case anyone else has to go through the same hell I have.
Dave Twilley VCP5
Datek Solutions Ltd.
So I met the Dell Skytech guy at the datacentre this morning, his laptop gets 100MB/s either direct to the SAN or via the switches so looks like my laptop test was just an anomaly! He spoke to some guys at base and they said try an earlier TG3 driver, that made no difference but its interesting that the one on record as testing/working with EQL is the 3.12xx driver which is known to corrupt datastores!
Anyway, we're now escalating it with Dell as a Broadcom NIC issue as we have nowhere left to go. I have an Intel QP NIC on standby.
In the meantime I'm sticking VSphere 5.5 on to the host and will see what happens. What's annoying is we see 100-200MB/s on the NICs when doing vMotion, so this is either limited to just iSCSI or its not a Broadcom issue.