Hi all,
We have setup a new (test) environment including two ESXi installations.
The following setup has been created:
DELL MD3000i -> DELL PowerEdge R200 (Webserver)
\_> DELL PowerEdge R200 (MySQL server)
On both servers we configured the iSCSI Initiator to the MD3000i, which is connected to the datastore.
However, every minute (yes! every minute) we recieve these messages on both R200's:
Lost connectivity to storage device
naa.6002219000c90093000004484a32546e. Path
vmhba33:C1:T0:L1 is down. Affected datastores:
"MD3000i-Web".
error
16-7-2009 0:34:48
Lost access to volume
4a4e7326-1225570a-7602-00219bfbd53c
(MD3000i-Web) due to connectivity issues.
Recovery attempt is in progress and outcome will
be reported shortly.
info
16-7-2009 0:34:48
Successfully restored access to volume
4a4e7326-1225570a-7602-00219bfbd53c
(MD3000i-Web) following connectivity issues.
info
16-7-2009 0:34:52
Can anybody point me in a direction to solve this, because both servers are losing their connections, there are inmense hickups in servering the webpages.
Thanks in advance!
Regards, Mike
I've attached a screenshot where you can see how both servers are configured.
The names there are exactly the same (the iqn.* part). Is this correct?!
No - every iSCSI port (either initiator or target) needs to have an unique iSCSI name. Looks like you have confused the initiator name (which is usually autogenerated by ESXi) and target name. The iSCSI name for the target needs to be entered on the "Static Discovery" page (or even does not need to be entered manually at all, if you specify the target IP address on the "Dynamic Discovery" page, and the target supports the Send Targets request).
Now you need to make the initiator name unique again (and different from the name used by the target itself).
Describe how you have them connected. The iSCSI is on a separate network and separate switch? etc.
Yes,
We have a gigabit switch for internal data use.
The SAN is connected to this switch (with one cable), and both R200's are connected to this switch as well, we are not meshing or setting up a HA setup.
The SAN has it's own internal network, 192.168.130.101, the MySQL server has 192.168.130.10 and the webserver 192.168.130.11, ESX uses these IP addresses to communicate with the SAN.
On the same interfaces, but on a different subnet, the virtual machines are connected to each other (web (172.x.x.x) -> mysql), since the R200 just has two ethernet ports and 1 is for WAN and 2 is for LAN.
On the SAN we created one target, with a 'hostgroup' called Web0, both R200 - ESX installations are pointing to that target with their initiators.
If you want to I can make some screenshots for you from a few screens.
Please say so if you desire these...
The problem is I have absolutely no clue where to look for.
I think you may have a problem with that second subnet. I see too many duplicate VM's???????
There is an issue associated with iscsid initiator in ESX/ESXI 4.0. The issue is there is no retry mechanism implemented in iscsid initiator.
So lets say if a storage array sends an error ( timeout or offline ) error to the initiator, the initiator is supposed to retry in every 2 secs till the array is back online. but unfortunately this is not implemented in ESX iscsid initiator for the error code 0x301 ( Service unavailable ) . just check if your array generates such error code? i.e. 0x301
This issue is fixed in ESX/ESXi 4.0 Patch 1 which is posted in vmware website. better to upgrade your system to patch1 and see if you see this issue again.
Thanks,
Krishnaprasad
I've installed the patches you recommended, however after installation there is no change.
I've attached two screenshots, how it looks when it looses the connectivity and when it haves connectivity.
We are using L0 & L1, the L31 is not used in my knowledge.
Thanks for your reply! Mike
What do you mean with too many duplicate VM's?
All VM's are 'needed', and are all different installations...
The problems occur even when no VM is started!
looks like a different issue that i was referring.
does rescan work?
Your SAN is widely used. This isn't a normal issue. Have you made use of you Dell service contract?
OK, I think I've found the problem.
As told our setup consists of two R200's with ESXi installed.
When we disable the iSCSI Initiator on one of the two R200's, the connection drops are gone!
I think we've misconfigured the SAN and/or ESXi, since I can see only one iSCSI session when both R200's are up.
And, as far as I can see/guess now, both are overtaking the same iSCSI session.
I've got only one question left; how do I determine what the correct initiator names for the R200's (ESXi) are?
And do I need to give an alias?
I've attached a screenshot where you can see how both servers are configured.
The names there are exactly the same (the iqn.* part). Is this correct?!
This is the correct name....This depends upon the hostname that you set for R200 OS. you need to give the complete iqn name in MD3000i
I've attached a screenshot where you can see how both servers are configured.
The names there are exactly the same (the iqn.* part). Is this correct?!
No - every iSCSI port (either initiator or target) needs to have an unique iSCSI name. Looks like you have confused the initiator name (which is usually autogenerated by ESXi) and target name. The iSCSI name for the target needs to be entered on the "Static Discovery" page (or even does not need to be entered manually at all, if you specify the target IP address on the "Dynamic Discovery" page, and the target supports the Send Targets request).
Now you need to make the initiator name unique again (and different from the name used by the target itself).