VMware Cloud Community
nelo_1990
Contributor
Contributor

Re: Hosts lost connection from iSCSI SAN storage

Hi all, I'm now encountering a weird issue that the hosts lost connection persistently. I really need help for this issue. The whole system is mentioned as follows.

ESXi 5.0

2 hosts: HP Proliant DL380G7

iSCSI SAN storage: Netgear ReadyNAS 3100

4 virtual machines are installed separately in the hosts: Database server, Application server, Web server and vcenter server.

When this issue comes, I can see that the vmhba35 is down on both hosts and the hosts lose connections from iSCSI SAN storage. And from the "Recent Tasks", the virtual machines keep resetting by themselves. Though the vSphere shows that the virtual machines are running but I cannot access to them by remote desktop or console. After I reboot the hosts, then the system can run again. I wonder if this is an issue regarding Vmware. Please help! Thanks a lot!

Tags (3)
0 Kudos
30 Replies
nelo_1990
Contributor
Contributor

253.66 and 253.70 are both not able to respond to vmkping from the host if the iSCSI adapter is down. I tried to change the multipath policy from "Most Recently Used" to "Round Robin", but it seems that this doesn't help.

0 Kudos
gman18480
Enthusiast
Enthusiast

At this point if you have already confirmed the network configuration is well. I would start looking at the physical layer. Have you tried swapping the patch cables for vmnic3, connected something else to it's switch port to confirm that the switch port is working correctly? If it is not the cable or the switch port I would then try configuring your iscsi vmkernel interface on another vswitch using a vmnic other than the current one to confirm that it is not the network card that is the issue.

Garret DeWulf Professional Services / VMware Consultant / VCP 4&5 / www.veristor.com
0 Kudos
nelo_1990
Contributor
Contributor

But by looking at the physical layer, it should not be both iSCSI adapters on the hosts go down and the network cards of both hosts have problems. Now I just run the Database server on one host and Application server on the other. I notice that the host with Database server running keeps on saying that the I/O latency increase from the SAN storage. See the attached picture. And I think every time if the host with Database server running loses a connection with the SAN and that makes the system crash.

0 Kudos
gman18480
Enthusiast
Enthusiast

K, so It sounds like you have your answer. What I guess your asking is how you can get more throughput out of your iscsi fabric? The answer to that would be to add another network card and vmkernel interface then bind that as another interface to your iSCSI software intiatior. Or... you could look at the application side of things and try to tweak your database server. Normally SQL uses less disk I/O if you give it more memory. However, im not a DBA to help you with that but it's just a thought. 

Also, you might want to take at this link: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102927...

Garret DeWulf Professional Services / VMware Consultant / VCP 4&5 / www.veristor.com
nelo_1990
Contributor
Contributor

I tried to change more configuration in vmware to improve the latency.

1. Change from Broadcom iSCSI adapter to software iSCSI adapter and enable jumbo frames, however it seems that jumbo frames can be enabled on only one ethernet port of the SAN, which is 253.66.

2. Disable virtual interrupt coalescing on both hosts.

3. Use vmxnet3 instead of E1000 on all virtual machines.

4. Set power performance to High Performance on Application Server and Database Server.

5. Enable RSS (Receive Side Scaling)  on Application Server and Database Server.

After these kind of changes, the warnings of latency increase still come up, but this happens less than before. However, I'm afraid that if these warnings happen many times, the system will crash again. I tried to replace the cables, but it seems the cables are not the cause. So, is there anything that I can do to improve the latency, and I just wonder why the latency is still so high on Database server. As I can see from the performance chart of the Database server, the write latency from the SAN can reach 600ms sometimes. Thanks a lot!

0 Kudos
nelo_1990
Contributor
Contributor

Now, I found that the latency warnings onyl happen at the time when there is a backup job on Database Server. The backup size is 0.8GB and the duration is only 1.5mins. Attached is the picture showing the write speed.  So is there anything can do to improve the latency in the Database Server? Thanks a lot!

0 Kudos
gman18480
Enthusiast
Enthusiast

Another feature you can check for is if you have enterprise license or higher you should be able to enable storage io control feature. This will help even the io load of your data store during times of congestion.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102209...

Garret DeWulf Professional Services / VMware Consultant / VCP 4&5 / www.veristor.com
0 Kudos
nelo_1990
Contributor
Contributor

Oh..But I don't have Enterprice Plus license, so I'm not able to enable SICO feature.Though the system has now been running for almost 2 weeks, but I still wonder if there is any method to throttle the iops of Database Server? Thanks!

0 Kudos
nelo_1990
Contributor
Contributor

I tried to change the Limit of IOPS to 20,100 under Resource Allocation tab, but it seems that nothings changed at all. I changed the specific hard disk which stores the backup.

0 Kudos
porschenm
Contributor
Contributor

have the same problem since updated from 4.1 to 5.0

during high load times (backup) we get the same "latency" messages.

We allready use the software iscsi adapter with broadcom nics. all cables seem to be ok (cat6).

We get this messages every night, wenn vmware data recovery does its backups. once 2 of 4 path went down on one of 4 hosts...

[Windows 7 Help|http://windows-7-board.de]
0 Kudos
nelo_1990
Contributor
Contributor

It seems that this issue can not yet be solved. By the way, which iSCSI storage array are you using in your system? Mine is using Netgear ReadyNAS 3100. I have another MIS system which is almost the same as this defective MIS system, what difference is the iSCSI storage array on that system is using a HP iSCSI storage array MSA2312i. But on that MIS system, there is no latency warnings, so I was thinking of the iSCSI storage might be the cause of this issue. However, to replace the iSCSI storage for testing is quite time-consuming and costly, so I prefer not to replace it first. And the MIS system is now still running with some latency warnings during the backup job. Hope this will be fine.

0 Kudos