We are experiencing seriously annoying problems with the software iSCSI initiator for our ESX servers. If we reboot an ESX server, it breaks the initiator. It's a weird problem though, because the initiator can see the lun IDs but it cannot read the volumes. Trying to "add storage" causes an error "cannot read the partition information". The only thing that fixes the problem is,
1.) Disable the software iSCSI initiator.
3.) Enable the software iSCSI initiator.
Then everything works peachy. VMotion works great and migration only takes 30 seconds on most of our VMs. Everything works phenomenally awesome... Until the host crashes or you reboot it. Then you have to go through that whole process I described above all over again.
What in the world is going on? This is driving us crazy. Is this a bug in the initiator?
We are running ESX 3.0.2 patch 1, the latest.. and the latest VC. The target SAN is a Dell MD3000i powervault.
Any help will greatly be appreciated!!!
This may stem from the fact that the MD3000i is not supported and may take a firmware upgrade to be supported. The problem you're having is similar to what deploylinux posted here (see last post) - http://communities.vmware.com/message/772741. He also notes that:
bad bad news: from what I can tell, until Dell formally announces vmware support and probably updates the firmware -- the current firmware may not be able to support DR/HA properly. When we create a vmfs on one node, all works fine on that node...but if you try to access the vmfs from another node, the md3000i blocks.
We haven't experienced those same issues. We have 1 HA cluster and everything works really well. HA is awesome, VMotion works like a charm. The whole system rocks.... until you reboot one of the hosts, it goes buggy. But we don't have issues with the paths, and/or not seeing the LUNs. All of the LUNs are showing up, but when you try to access the volume, it doesn't work. I will see if I can post some debug info. I'm also not ruling out some sort of compatibility problem, but anybody reading this may want to hear, Dell personally informed us that the certification for the MD3000i was pending approval when we purchased the product 1 month ago. So i'll have to see what the status is on that and see if this problem has been encountered by Dell/VMware quality testing.
Here is some more detailed info when this problem occurs,
After rebooting (which effectively breaks the iSCSI connections to the SAN) we see these errors when ESX is starting up, during "Configuring S/W iSCSI..."
VMWARE: Device that would have been attached as scsi disk sdc at scsi1, channel 0, id 0, lun 5 Has not been attached because this path could not complete a READ command eventhough a TUR worked. result = 0x8000002 key = 0x5, asc = 0x94, ascq = 0x1 VMWARE: Device that would have been attached as scsi disk sdc at scsi1, channel 0, id 0, lun 5 Has not been attached because it is a duplicate path or on a passive path Vendor: DELL Model: MD3000i Rev: 0650 Type: Direct-Access ANSI SCSI revision: 05
Going in to STORAGE and attempting to "Add storage" to the host, the LUNs DO show up in the list. But when we try to add one by clicking NEXT, we get an error on the GUI screen which says,
Unable to read partition information from this disk.
This will stay broken until you disable the iSCSI initiator, reboot, enable it, and then reboot. But if you reboot again, it does the above and stays broken. Around and around we go. I really don't know what is causing this. If anybody has experienced this or has any information that can point us in the right direction, I would greatly appreciate it!
If this is for production then you might want to check what build of 3.5 you have. A new one was released tonight.
Original builds posted last week: ESX Latest Released Version: 3.5.0 - Released 12/03/07 - Build #64557 VirtualCenter Latest Released Version: 2.5 Released 12/03/07 - Build #64201 New updated builds posted this week: ESX Latest Released Version: 3.5.0 - Released 12/10/07 - Build #64607 VirtualCenter Latest Released Version: 2.5 Released 12/10/07 - Build #64201
Thanks for the info Dave!
I just downloaded and installed the latest build a few hours ago. UPDATE: This appears to be the official release, and is posted in downloads.
VMware ESX Server 3.5
Latest Version: 3.5 | 12/10/2007 | Build: 64607 | Release Notes
VMware VirtualCenter 2.5
Latest Version: 2.5 | 12/10/2007 | Build: 64201 | Release Notes
It's good to know that this release fixes literally all of the problems I was experiencing. I'm really excited about it and can't wait to do further testing and roll out our new ESX implementation in January! I am flagging your post as helpful, because people need to be aware that the 3.5 builds have changed and they should update to the above.