Is this being asked because there is a possibility
that having a NIC bond going to the ISCSI target
could mitigate this issue? The reason I ask is I have
several ESX Hosts hitting an EQL ISCSI SAN, and I
have several RHEL4 UP4 guests with Sybase Databases
running in the on the guest and I've never seen this
issue. Even when I've spent 9 or 10 hours loading
1TB+ database.
OK, I know you didn't ask me this question, but I thought my experiences might be interesting to you. We have ESX servers running against several different storage arrays, a CX400 via FC, a CX700 via FC, a AX150i via iSCSI, and an Equalogic PS300E via iSCSI (and also a few systems running on local storage like IBM ServeRAID and Dell PERC controllers).
We've found the problem to be fairly easy to reproduce on the AX150i, as well as the CX700. For whatever reason it's actually more difficult to trigger it on the CX400, we theorize it has to do with the smaller write cache and thus lower latency during heavy writes but it may also be related to the fact that the CX400 simply has less contention because it services fewer hosts.
The Equalogic PS300E is by far the most difficult for us to reproduce the issue. We did mange to create the issue even when running against this array but it took a pretty crazy level of fake I/O load running in multiple VM's to do it.
On the other hand, if you pull the plug on a network cable, it's likely that you'll see the issue on every Linux guest by the time the iSCSI connection fails over to another port. My opinion is that this fix is an important proactive fix if you want your guest to continue running during any failure scenario, especially since that's usually the reason you invest in all of the redundancy.
Later,
Tom