Hi all, I'm now encountering a weird issue that the hosts lost connection persistently. I really need help for this issue. The whole system is mentioned as follows.
ESXi 5.0
2 hosts: HP Proliant DL380G7
iSCSI SAN storage: Netgear ReadyNAS 3100
4 virtual machines are installed separately in the hosts: Database server, Application server, Web server and vcenter server.
When this issue comes, I can see that the vmhba35 is down on both hosts and the hosts lose connections from iSCSI SAN storage. And from the "Recent Tasks", the virtual machines keep resetting by themselves. Though the vSphere shows that the virtual machines are running but I cannot access to them by remote desktop or console. After I reboot the hosts, then the system can run again. I wonder if this is an issue regarding Vmware. Please help! Thanks a lot!
Have you checked the ESXi host logs? Could you attach any logs if possible?
Could you provide us the host logs (File -> Export -> Export System Logs). These will provide more information
from the screen shots i can see there is lot of latency errors..
- what is your multipathing policy
- teaming policy
- any etherchannell configured
- howmany nics are there for the iscsi
- what is the MTU settings
-
also check is there any APD/PDL conditions there in the the environment refer the below links
http://blogs.vmware.com/vsphere/2011/08/all-path-down-apd-handling-in-50.html
The teaming policy is not enabled.There is only one Broadcom iSCSI Adapter used for the iSCSI, which is vmhba35. MTU is set as 1500. I have checked from the vmkernal.log that there should be a APL condition in the system.
I see that your iscsi interface is one the same subnet as you managment network and vmotion network. I would highly recommend you atleast seperate iscsi and vmotion traffic to seperate vlans if not ideally put iscsi on it's on network or it's own dedicated switches.
My mistake nelo I missed your subnet mask and see it is different subnet. How about running vmkping "your iscsi target" from the console. Are you able to atleast get connectivity? vmkping will use your vmkernel interface to ping your array.
Could you provide us some more infiormation on the network?
I can see in the logs that you have these errors:
2013-01-28T05:15:24Z iscsid: Login Failed: iqn.2012-11.SAN01:san if=bnx2i-441ea1380118@vmk1 addr=10.121.253.70:3260 (TPGT:1 ISID:0x2) Reason: 00040000 (Initiator Connection Failure)
2013-01-28T05:15:24Z iscsid: Notice: Reclaimed Channel (H35 T0 C1 oid=1)
2013-01-28T05:15:25Z iscsid: DISCOVERY: Pending=1 Failed=1
2013-01-28T05:15:26Z iscsid: DISCOVERY: Pending=1 Failed=1
2013-01-28T05:15:28Z iscsid: Login Failed: iqn.2012-11.SAN01:san if=bnx2i-441ea1380118@vmk1 addr=10.121.253.66:3260 (TPGT:1 ISID:0x1) Reason: 00080000 (Initiator Connection Failure)
2013-01-28T05:15:28Z iscsid: Notice: Reclaimed Channel (H35 T0 C0 oid=1)
2013-01-28T05:15:28Z iscsid: Notice: Reclaimed Target (H35 T0 oid=1)
According http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201217... this means there is a network timeout/problem.
Maybe you have double IP's or wrong subnet...
The SAN has two ethernet ports,and they're configured as 10.121.253.66 and 10.121.253.70. For the iSCSI ports of thw hosts, they are configured as 10.121.253.67 and 10.121.253.68. resp. The SAN is connected to a switch which is connected to two hosts respectively. I think there should be no double IPs and wrong subnet.I just wonder why the latency is so high for both hosts when they are trying to access data from the SAN. I think that's the reason which causes the iSCSI adapters down and the hosts lost connection from the SAN. Is it the problem of the SAN or other else? And is there any solution to shorten the latency? Thanks a lot!
I tried to run vmkping 10.121.253.66 or 10.121.253.70 (SAN) from the host. But I am not able to get the connectivity.
Are the iscsi NICs and San on their own dedicated switch or sharing with VM data, mgmt, and motion? If so, what model is your switch?
Usually means the San is not accessible on your network. Are you able to ping the San IP from your workstation?
Now usually only one of the hosts is down, and the other host can still access the SAN via the switch.
I'm not familiar with ready nas. However, I do now on some iscsi arrays you have to enable multiple initiator connections access to the volume for more than one initiator to connect at the same time. Ideally its best practice to have a separate physical switch dedicated for just iscsi. However, if that is not possible you can drop down to seperate vlans. Are you only running that 8 port on just iscsi?
Yes, that 8 port switch is only running for iSCSI.
253.66 and 253.70 are both not able to respond to vmkping from both host or just one of the two? Also, what native multipath policy are you currently using?