We've updated to ESX4 and have implemented round robin MPIO to our EQL boxes (we didn't use round robin under 3.5), however I'm seeing 3 - 4 entries per day on the EQL log that indicate a dropped connection. See logs below for EQL & vCenter views on the event.
EQL Log Entry
INFO 10/06/09 23:50:32 EQL-Array-1
iSCSI session to target '192.168.2.240:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.111:58281, iqn.1998-01.com.vmware:esxborga-2b57cd4e' was closed.
iSCSI initiator connection failure.
Connection was closed by peer.
vCenter Event
Lost path redundancy to storage device naa.6090a018005964bc9384643a2a0060cf.
Path vmhba34:C1:T3:L0 is down. Affected datastores: "VM_Exchange".
warning
6/10/2009 11:54:47 PM
I'm aware the the EQL box will shuffle connections from time to time, but these appear in the logs as follows, (although vCenter will still display a Lost path redunancy event.)
INFO 10/06/09 23:54:47 EQL-Array-1
iSCSI session to target '192.168.2.245:3260, iqn.2001-05.com.equallogic:0-8a0906-bc6459001-cf60002a3a648493-vm-exchange' from initiator '192.168.2.126:59880, iqn.1998-01.com.vmware:esxborgb-6d1c1540' was closed.
Load balancing request was received on the array.
Should we be concerned or is it now normal operations for the ESX iscsi initiator to drop and re-establish connections?
Ok well I tried to escalate this today with vmware and they told me they have found the problem, they are testing the fix currently and it is planned for release in U5. That will be several months away yet.
I was told it is a problem only with Dell Equallogic series, which was interesting...
Unfortunately a few months is too long for me, so when I asked what they suggesed as a workaround, they recommended to downgrade the systems to ESX 3.5.
Not at all what I want to do, but I don't want to risk any data loss with clients either.
What!?! U5!? As in update 5?! That could be a year OFF.....I cant believe it! What do others think about this? I am going to try to get EQL's opinion on this.
Thanks so much for posting your information.
For me, the issue does not occur that frequently and there is NO way that I'm going back to 3.5! Again, whenever I have had a path fail it instantly switches over to the redundant path.
Yes update 5. This is only what I got from the vmware support guy I talked to about our case, if anyone else can get more/different information then please post here.
I'm going to try to hold off from the rebuild and see what other feedback people get, maybe from Dell also.
Rebuilding everything as 3.5 is not at all ideal, especially as I wanted to take advantage of the improved iSCSI performance in vSphere, but not at the risk of data loss.
One odd thing is that I don't see this disconnect issue. Configuration is 3x ESX servers build 208167 each having,
- One vSwitch for iSCSI with 2 physical NICs
- 2 vmk iSCSI ports on same subnet
- 2x Force10 switches with 4-port LAG between them
- PS4000 with both interfaces active, management interface on seperate network
- Several LUN configured as RR and IOPS set to 3 on each host
It's not live yet so mostly idle with occasional massive IO for testing. No errors reported??
When you say 2 vmk ports...is that total? If so..you might not see
much immediately but you will eventually.
Add another vmk port so you have two or three PER nic...then you
should see a lot more drops.
Sent from my iPhone
On Jan 28, 2010, at 6:50 AM, J1mbo <communities-emailer@vmware.com
So, has anyone got any more information on this as yet?
Would be interesting to know if vmware/Dell have told you guys anything new, or how the prolonged testing is going.
Yes there is 1 vmk per pNIC. Not too sure I understand why n:1 would be beneficial anyway, since both ends are the same @ 2x GbE.
Yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.
Hi, yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.
Hi, yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.
Hi Sixth, yes there is 1 vmk per pNIC. I don't understand why n:1 is beneficial anyway, since in the case of the PS4000 both ends are 2x GbE.
Hi S1xth, indeed I have 1 vmk per pNIC. As I have PS4000, there are two GigE's at each end, so I don't think there is any advantage to increasing that.
IMO,you will not have to wait for U5 for the fix to be released.
No one that I have ever talked to, with a properly configured system has had data loss of any kind. Again, during very LOW I/O periods is when you see the issue on a single path. It recovers, if you have redundant paths you don't lose connectivity to the storage. If you have more VMkernel ports than physical NICs you might actually see the issue occur more often, with more VMkernel ports you're less likley to be able to keep them all running all the time, but with multiple VMkernels per NIC you're less likely to suffer and all paths down scenario.
The alerts are annoying and generate allot of noise, but I haven't heard that someone has gone down because of it. Has anyone on the list suffered an All Paths Down (APD) due to this bug? Where the log shows that all the ports where disconented by NOOP failures at the same time? (/var/log/vmkiscsid.log)
-don
I have never had more than one path fail per day per host so far. I have 6 virt/2 Phys nic, so I have 6 paths per host for iSCSI. I have a log entry once per day usually the wee morning hours and only 1 path is lost per host. Most times only 1 host has a failed path overnight.
John Z
From: dwilliam62 <communities-emailer@vmware.com>
To: <jzolnows@slcr.wnyric.org>
Date: 01/29/2010 09:49 AM
Subject: New message: "ESX4 swiscsi MPIO to Equallogic dropping"
Hi John,
OK so that's consistent with what I've seen. Annoying but not causing down time. If it really bothers you, you can try something, reduce the minimum number of IOs that go down each path. The default is 1000. There was a joint storage vendor "paper" (Dell/EMC/NetApp) that suggested changing that to three (3) instead. Since you IO load is so low at that time you're tripping over the bug. Getting more consistent IO going over the available paths will likely reduce the frequency of the alerts.
<![endif]><![if gte mso 9]>
*Question
3: “I’ve configured Round Robin – but the paths aren’t evenly used”*
Answer: The Round Robin policy doesn’t issue I/Os in a simple “round
robin” between paths in the way many expect. By default the Round Robin PSP
sends 1,000 commands down each path before moving to the next path; this is
called the IO Operation Limit. In some configurations, this default
configuration doesn't demonstrate much path aggregation because quite often
some of the thousand commands will have completed before the last command is
sent. That means the paths aren't full (even though queue at the storage array
might be). When using 1 Gbit iSCSI, quite often the physical path is often the
limiting factor on throughput, and making use of multiple paths at the same
time shows better throughput.
You can reduce the number of commands issued down a particular path before
moving on to the next path all the way to 1, thus ensuring that each subsequent
command is sent down a different path. In a Dell EqualLogic configuration, Eric
has recommended a value of 3.
You can make this change by using this command:
esxcli
--server <servername> nmp roundrobin setconfig --device <lun ID>
--iops <IOOperationLimit_value> --type iops
Note that cutting down the number of iops does present some potential problems.
With some storage arrays caching is done per path. By spreading the requests across
multiple paths, you are defeating any caching optimization at the storage end
and could end up hurting your performance. Luckily, most modern storage systems
don't cache per port. There's still a minor path-switch penalty in ESX, so
switching this often probably represents a little more CPU overhead on the
host.
-don
Don,
I have done some research on the IOPS setting mentioned in the multi-vendor iSCSI post, and talked to Dell and EqualLogic support about it, and they don't know where the recommendation for setting this to 3 came from. If anything they recommended 300. I experimented with the setting and found that changing it from the default left a random number as the setting. So in short I believe leaving it at the default is best, and if you are going to change it, do it on a single test volume. There was also another thread on the delltechcenter.com site where someone used IOmeter to test the various settings, and if you are interestd in seeing the results. See the thread at the bottom of the multi-vendor post.
As you have mentioned the drops don't appear to cause data loss. That is true for us as well.
-Rob
Rob...
Even if you let the value at defualt 1000 and reboot the host with RR configured the number changes to a crazy value. That is what I have seen and have read and this supposdly is a reported bug in U1.
Jonathan
We run mulitple PS storage arrays and we have not had any issue as of yet. We however have not switched to jumbo frames. It seems that the load on the wire seems to be an issue for the drops so once the patch is released we will recreate the vswitches with jumbo frames.
Keep this below article in mind when using jumbo frames
the below excerpt is from http://www.networkworld.com/forum/0223jumbono.html
Although proponents claim larger packets improve performance "on the wire," the impact is relatively insignificant. Compare the efficiency of a 9,000-byte large-packet system with a standards-based 1,500-byte system. The standard packet gets 1,500 data bytes out of 1,538 bytes of frame and overhead, or 97.5% efficiency. The nonstandard packet gets 9,000 data bytes out of 9,038 bytes, or 99.6% efficiency. To put it another way, the difference in time required to send a 1M-byte file is only 0.1 msec.
-Dwayne Lessner
Hi Dwayne,
I think several users include myself had the issue even using standard frames. We did not see an increase in the number of drops between non Jumbo vs Jumbo setups on our end.
John Z.
P.S. Is the date of the that article correct? Feb. 98? Or is that a typo? With all the recomendation out there and the newest tech I can't beleive that this article is still valid.......they are talkig about 10M connection when we are pushing 10G today.....just wondering...I'm no network expert by any stretch.