VMware Cloud Community
multirotor
Contributor
Contributor

iSCSI software initiator unstable ?

We are using the Esx 3.01 SW initiator for a few production VM's. These VM's are not very important for business so we can cope with some downtime during daytime.

It seems to me the software iSCSI initiator does not survive host reboots very well. Nor does it have any recovery from error conditions.

Our NAS device is a SUN 5320. All is working fine as long as nothing happens to the ESX hosts or the network. The first problem occurred after the reboot of one of our 10 ESX hosts. We could not get that host to connect back to the iSCSI storage and we had to remove the software initiator, reboot again and configure it again with a new initiator name. There is no structure in this behaviour. If I reboot all 10 ESX boxes, 7 will see the iSCSI storage and 3 will not see it. The next time, 4 other servers will not see it. Since reboots happen in a controlled way, I could live with it. Every reboot required some reconfiguration one some of the hosts.

Now we have encountered a 2 minutes network glitch and we lost the iSCSI volumes of the running VM's on ALL ESX hosts. After the network problem, the iSCSI storage did not come back. Rescan, reboot, ... does not help. The only thing which has proven to work is my procedure to bring the iSCSI config back to the factory defaults and reconfigure it on all hosts.

http://communities.vmware.com/thread/100720

Has this been improved in 3.02 or later patches ?

Is there a way to configure the timeouts (like for the microsoft SW initiator) ?

Does anyone have the same problem ?

Would a HW initiator solve these problems ?

Tags (1)
Reply
0 Kudos
3 Replies
spl
Contributor
Contributor

From recent experience, this is not fixed in 3.0.2r1. In fact the sw initiator seems even worse after r1; it hangs for ages during boot-up and still doesn't see LUN's every time - fixed with re-scan. Not had any problems with hardware intiators (so far....)

Reply
0 Kudos
chrwei
Enthusiast
Enthusiast

I have not had these kinds of issues with 3.5, but I do have other issues with it. guests that are not doing heavy IO even tend to survive a reboot of the target, which I was impressed by. what I'm not impressed by is that certain usage of via vmware causes the target host to need to be rebooted, and similar uses with other initiators do not cause this.

Reply
0 Kudos
IgorZ
Contributor
Contributor

I'm running 3.5 with the latest patches and having problems with SW iSCSI as well, all though I'm not sure whether it's on the ESX side or the target side. Once in a while, rebooting a windows VM will take about half an hour. I did lose iSCSI connection once on an ESX reboot and had to reboot the target as well to get it back; rescans didn't work.

Reply
0 Kudos