This is good news. I am in the process of renewing our SmartNet agreements to discuss this with Cisco myself. What's interesting is I'm not using IOS 15. I have used IOS 15 in a few environments in the past and in every single one of them I've had switch crashes in the middle of the night (one about every few months). I do NOT see a flowcontrol problem with switches running 12.2(55)SE5 but I am seeing the problem with 12.2(58)SE2 and 15. These are all 3750G switches.
I've sent you a PM as well. Thanks for the update on this.
The crashed also happen on IOS 15 and C3750-x units in a stack. There is no crashlog written.
I am having the exact same issue as well.... But the sad thing is I am using all Dell equipment : Dell R720 servrs, a pair of stacked 6248 switches... I've changed the IOPS parameter on all the ISCSI interfaces from 1000 to 3, and set them to default to round robin, turned off LRO and Delayed ACK, but still no change in throughput... Maximum I've been able to get has been around a 110 MBps... Did anybody find a solution for this issue??? Any help would be very much appreciated...
We see this error since years now, i have tried a lot with dell, vmware and IBM also, but i have the same problems as you also have....
We´re on VMWare 5.1, 2 EQL Groups (PS6000 and PS5000), Firmware 5.2.2, Using Onboard Broadcom and Quadport Intel Cards.
I have never seen more than 80-100 MB/s....
We should maybe try to share our VM Network setup, as this could be a big point in regards of performance.
I have 3 Nics in a vSwitch and use 3 VMKernel Ports. On top i have two VM Network.
So the ESXi Servers use 3x 1 Gbit Nics to the Groups (Switches DELL 6248 Stacked).
The VM´s (as seen on the screenshot): ISCSI1 uses VMNIC3, ISCIS2 uses VMNIC4 (and VMNIC5 as Standby)
Internally in the VM´s i use VMXNET3 with default settings (except that i have set Jumbo Frames to 9000)
I haven´t changed anything in ESXi (in regards to TCO, TSO, LRO etc). We use MEM 1.1.
So i some of you see trouble with my above setup, feel free to give me a hint. Everything is made after the manual.
But the problem is that the performance is quite bad....
Has anyone tested V5.2.4 firmware yet?
I have not, but it will not solve the problem i think.
It's not a vmware or network configuration issue. I've tested performance using multiple different physical server as well as 3 different types of network switches. All configurations experienced the same issue.
At the same time I have two EQL devices running 5.0.7 that are not seeing any issues.
The Problem with ISCSI in ESXI 5 U1 is only after a reboot. To avoid this bug, please set Failback to yes on the ISCSI vSwitch. Before U1 the solution was Failback to No.
After working on this issue for over a month with EqualLogic Level 3 support, Dell PowerEdge Group (responsible for VMware issues within Dell) and Directly with VMware I think we might have found a resolution to this issue... That being said, there are a number of things that can cause this issue so i'll try to include some troubleshooting steps that might be helpful in narrowing down cause of the issue in your particular environment...
I would highly recommend you read carefully “ Configuring iSCSI Connectivity with VMware vSphere 5 and Dell EqualLogic PS Series Storage “… When we first started working on this issue, this document was not even available (they still only had the best practices for vSphere 4.1)... I would highly recommend you setup the Storage Heartbeat port as per the recommendations outlined on this document (Needs to be the lowest numbered vmkernal port on the vSwitch, which means this port needs to be created first on the vSwitch, and also enable Jumbo Frames on it if you are using jumbo frames in your environment)...
There still seems to be some weird issues with the VMware ESXi 5 Software ISCSI Initiator even after Update 1 (build 5.0.0, 623860)… So it might not be a bad idea to start with a new vSwitch (especially because you want the Storage Heartbeat needs to be the lowest numbered vmkernal port on that vswitch). You can move the nics one at a time to the new vSwitch if you cannot afford to bring down the host…
As one of the first troubleshooting tips, enable SSH on the host and run the esxtop command, then press n to display the live performance data of only the network interfaces on that esx host… But before you do this, please note which vmnics you have assigned for ISCSI traffic… I am almost certain, you will only see traffic flowing through only one of the assigned vmnics, hence the kind of performance numbers we all have been seeing (little over 100MBps throughput and around 3200+ IOPS, which by the way is approximately the theoretical maximum of a single gigabit port)… This was happening despite the fact we had Round Robin enabled and it was showing both paths as active (I/O)…
Also verify this from the controller side of things as well… If you have EqualLogic SAN HQ running in your environment… Look under Network -> Ports… You should see roughly the same amount of data sent and received on both the ISCSI gigabit ports on the controller…
If I am not mistaken, when properly configured each vmkernal port assigned to ISCSI traffic on the host creates an individual ISCSI connection to the volumes on the PS series array… Please verify that there are indeed two connections for each volume on the array (this can be done via EQL group manger or via the array console)…
If you are using MEM and the HIT kit, I think a lot of the tuning and performance optimizations are all taken care of, but in our environment we do not have VMware enterprise licensing, so we had to use Round Robin, but there are a few performance tuning configuration changes (change IOPS threshold for Round Robin from 1000 to 3, Disable Delayed ACK and LRO) you can make on the ESXi host that makes a substantial difference especially in response times … Let me know if anybody is interested in any of these settings and I can post them as well…
Currently I am getting a Maximum throughput (100% Read, 0% Random, 32 Byte packet size) of ~234 MBps, IOPS 7485, Average Response time 8.5ms
Hopefully this will give you a starting point in diagnosing and finding a resolution to the performance issues…
which Switches do you use for iSCSI in your setup ?
I asked Dell about the IOPS Threshold, and they told me that it is not supported. Do you have a official answer from VMWare or Dell regarding that setting?
This thread is VERY interesting. I surprised Equallogic was not able to get you guy fixed...
re: IOPS value. Dell/EQL, HP, EMC, VMware, etc agreed on some general principles about iSCSI configuration. One of the was to change the IOs per path from 1000 to 3. If you google "multivendor iscsi post" you will find it on several blogs.
One thing about Delayed ACK, is that you have to verify that the change took place. If you just change it, it appears that only new LUNs will inherit the value. (disable) I find that, while in maint mode, removing the discovery address and any discovered targets (in static discovery tab), then disabling Delayed ACK and re-add in Discovery Address and rescan resolves this.
At the ESX console run: #vmkiscsid --dump-db | grep Delayed All the entries should end with ='0' for disabled.
I just posted this to another thread on EQL perf issues:
Common causes of performance issues that generate that alert are:
1.) Delayed ACK is enabled.
2.) Large Recieve Offload (LRO) is enabled
3.) MPIO pathing is set to FIXED
4.) MPIO is set to VMware Round Robin but the IOs per path is left at default of 1000. Should be 3.
5.) VMs with more than one VMDK (or RDM) are sharing one Virtual SCSI adapter. Each VM can have up to four Virtual SCSI adapters.
6.) iSCSI switch not configured correctly or not designed for iSCSI SAN use.
This thread has some specifc instructions on how to disable DelayedACK as well.
Re: Storage Heartbeat VMK port. (Mentioned in EQL iSCSI config guides for ESX)
This will never affect performance. The lowest VMK port in each IP subnet is the default device for that subnet. With iSCSI MPIO when VMK ports are on the same subnet this can cause problems when the link associated with that VMK port goes down. (cable pull, rebooting/failed/powered off switch, etc) ESX will still use that port to reply to ICMP and Jumbo Frame SYN packets. During the iSCSI login process, the EQL array pings the source port from the array port it wants to use to handle that session request. If that ping fails the login process fails. The Storage Heartbeat makes sure that default VMK port is always able to respond to the ping. That VMK port must NOT be bound to the iSCSI adapter.
i had to update the nic driver for the broadcoms on our 620's to get it decent, after the firmware update to 5.2.4h1, latency went to god high numbers (20k+) and until the drivers, would just stay there nice and high. horrible horrible performance
we are a 100% virt shop so it was killing us, needless to say l1 support was less then useful.