A CALL FOR HELP
Thanks for the update. And what was the resolution for problem in you`re first threat? The kb is for intel nics.
Unresolved
You have a PM.
Can please anybody with the same problem then _VR_ with a new Equallogic look at the Switch if FlowControl is up. At the moment it looks like flowcontrol never comes up on the new PS4100.
This is an output when it`s not running, FlowControl oepr must be on:
Switch# show flowcontrol interface gigabitEthernet 5/5
Port Send FlowControl Receive FlowControl RxPause TxPause
admin oper admin oper
--------- -------- -------- -------- -------- ------- -------
Gi5/5 off off desired off 0 0
Is there any update on this situation? I am about to place an order for either a 4100X or a 4100XV and wonder if I am going to experience the same problems. My three hosts are IBM x3650 M2 boxes with two built-in Broadcom NIC ports and a quad-port Intel 82571EB NIC.
I will probably replace my current physically-dedicated SAN switches with a pair of HP 2510-24G units.
No, still not fiexed at my site. But cause is unknown.
Real live Performance is OK. But read performance is ----
Nachricht geändert durch alex555550
I'm having the exact same issue you're seeing. Alex - I've shot you a PM with more details...
So I've been working on this a lot today and I've made a little progress. Equallogic support sent me this doc:
http://www.equallogic.com/WorkArea/DownloadAsset.aspx?id=10799
Specifically it mentions the heartbeat VMK for iSCSI. I did not have this configured (I am using the MEMs). For a while my retransmits in SAN HQ have been above normal (<1% but above 0.1% where it should be <0.1%). I was attributing this to the flowcontrol issue that Alex pointed out. After I added the Storage Heartbeat VMK, my retransmits fell to <0.1% and my transfer rate is much higher. I would suggest trying this if you haven't already.
On a side note, I have a replica environment that has a PS6100 and Dell R710 hosts with Intel PT quad-port cards in it for failover (I'm using the same in the primary except it's a PS6500). That site has the same switches (Cisco 3750) but a slightly older firmware (12.2.55 vs 12.2.58). This site is reporting proper flowcontrol on for the storage ports. I'm not sure if the switch firmware has anything to do with it but I plan on opening up a TAC case soon to see what they say.
Hi,
thanks for the update. I`ll have this Problem with all my Cisco`s.
Hi,
ok cisco finaly found an bug in IOS.
CISCO BUG ID: CSCty55093
This is good news. I am in the process of renewing our SmartNet agreements to discuss this with Cisco myself. What's interesting is I'm not using IOS 15. I have used IOS 15 in a few environments in the past and in every single one of them I've had switch crashes in the middle of the night (one about every few months). I do NOT see a flowcontrol problem with switches running 12.2(55)SE5 but I am seeing the problem with 12.2(58)SE2 and 15. These are all 3750G switches.
I've sent you a PM as well. Thanks for the update on this.
The crashed also happen on IOS 15 and C3750-x units in a stack. There is no crashlog written.
I am having the exact same issue as well.... But the sad thing is I am using all Dell equipment : Dell R720 servrs, a pair of stacked 6248 switches... I've changed the IOPS parameter on all the ISCSI interfaces from 1000 to 3, and set them to default to round robin, turned off LRO and Delayed ACK, but still no change in throughput... Maximum I've been able to get has been around a 110 MBps... Did anybody find a solution for this issue??? Any help would be very much appreciated...
Hi
We see this error since years now, i have tried a lot with dell, vmware and IBM also, but i have the same problems as you also have....
We´re on VMWare 5.1, 2 EQL Groups (PS6000 and PS5000), Firmware 5.2.2, Using Onboard Broadcom and Quadport Intel Cards.
I have never seen more than 80-100 MB/s....
We should maybe try to share our VM Network setup, as this could be a big point in regards of performance.
I have 3 Nics in a vSwitch and use 3 VMKernel Ports. On top i have two VM Network.
So the ESXi Servers use 3x 1 Gbit Nics to the Groups (Switches DELL 6248 Stacked).
The VM´s (as seen on the screenshot): ISCSI1 uses VMNIC3, ISCIS2 uses VMNIC4 (and VMNIC5 as Standby)
Internally in the VM´s i use VMXNET3 with default settings (except that i have set Jumbo Frames to 9000)
I haven´t changed anything in ESXi (in regards to TCO, TSO, LRO etc). We use MEM 1.1.
So i some of you see trouble with my above setup, feel free to give me a hint. Everything is made after the manual.
But the problem is that the performance is quite bad....
Has anyone tested V5.2.4 firmware yet?
I have not, but it will not solve the problem i think.
steffan,
It's not a vmware or network configuration issue. I've tested performance using multiple different physical server as well as 3 different types of network switches. All configurations experienced the same issue.
At the same time I have two EQL devices running 5.0.7 that are not seeing any issues.
The Problem with ISCSI in ESXI 5 U1 is only after a reboot. To avoid this bug, please set Failback to yes on the ISCSI vSwitch. Before U1 the solution was Failback to No.
After working on this issue for over a month with EqualLogic Level 3 support, Dell PowerEdge Group (responsible for VMware issues within Dell) and Directly with VMware I think we might have found a resolution to this issue... That being said, there are a number of things that can cause this issue so i'll try to include some troubleshooting steps that might be helpful in narrowing down cause of the issue in your particular environment...
I would highly recommend you read carefully “ Configuring iSCSI Connectivity with VMware vSphere 5 and Dell EqualLogic PS Series Storage “… When we first started working on this issue, this document was not even available (they still only had the best practices for vSphere 4.1)... I would highly recommend you setup the Storage Heartbeat port as per the recommendations outlined on this document (Needs to be the lowest numbered vmkernal port on the vSwitch, which means this port needs to be created first on the vSwitch, and also enable Jumbo Frames on it if you are using jumbo frames in your environment)...
There still seems to be some weird issues with the VMware ESXi 5 Software ISCSI Initiator even after Update 1 (build 5.0.0, 623860)… So it might not be a bad idea to start with a new vSwitch (especially because you want the Storage Heartbeat needs to be the lowest numbered vmkernal port on that vswitch). You can move the nics one at a time to the new vSwitch if you cannot afford to bring down the host…
As one of the first troubleshooting tips, enable SSH on the host and run the esxtop command, then press n to display the live performance data of only the network interfaces on that esx host… But before you do this, please note which vmnics you have assigned for ISCSI traffic… I am almost certain, you will only see traffic flowing through only one of the assigned vmnics, hence the kind of performance numbers we all have been seeing (little over 100MBps throughput and around 3200+ IOPS, which by the way is approximately the theoretical maximum of a single gigabit port)… This was happening despite the fact we had Round Robin enabled and it was showing both paths as active (I/O)…
Also verify this from the controller side of things as well… If you have EqualLogic SAN HQ running in your environment… Look under Network -> Ports… You should see roughly the same amount of data sent and received on both the ISCSI gigabit ports on the controller…
If I am not mistaken, when properly configured each vmkernal port assigned to ISCSI traffic on the host creates an individual ISCSI connection to the volumes on the PS series array… Please verify that there are indeed two connections for each volume on the array (this can be done via EQL group manger or via the array console)…
If you are using MEM and the HIT kit, I think a lot of the tuning and performance optimizations are all taken care of, but in our environment we do not have VMware enterprise licensing, so we had to use Round Robin, but there are a few performance tuning configuration changes (change IOPS threshold for Round Robin from 1000 to 3, Disable Delayed ACK and LRO) you can make on the ESXi host that makes a substantial difference especially in response times … Let me know if anybody is interested in any of these settings and I can post them as well…
Currently I am getting a Maximum throughput (100% Read, 0% Random, 32 Byte packet size) of ~234 MBps, IOPS 7485, Average Response time 8.5ms
Hopefully this will give you a starting point in diagnosing and finding a resolution to the performance issues…