VMware Cloud Community
MNKrantz
Enthusiast
Enthusiast

Host disconnected and not responding

I have 4 esxi 4.1 hosts in a drs/ha cluster. Every few weeks, out of no where, 2 of the hosts drop connection to the network and to the storage. The storage is FC. The only thing that seems to work is rebooting the SAN and the host servers. Already sent logs to VMware and they came up with nothing. I have seen the error "too many writes" in the activity pane of the vSphere Client when connect to hosts directly immediately after the reboot of the hosts. VMware stated that they believe that error is common from the storage but had nothing more to offer. Any suggestions on how to flush out the problem?

0 Kudos
3 Replies
marcelo_soares
Champion
Champion

Does ping to the ESXi hosts works during outages? Vms stops or keep running? A piece of logs (/var/log/messages filtered only with vmkernel entrie) would be useful during the outage.

Also, if you can't take the logs, you can make vMA retrieve them to analyze it after the outages: http://kb.vmware.com/kb/1024122

Hope this helps.

Marcelo Soares

Marcelo Soares
0 Kudos
MNKrantz
Enthusiast
Enthusiast

Yes, ping to the hosts during the outages works. Also, out of six hosts, I was able to connect via vCenter Client to five of them. The error on the sixth was that the server was taking too long to respond. Ping to the VMs do not respond during the outages. The only thing I found in our syslogs were references to some of the hosts loosing connectivity to some of the LUNs on the storage. We have dual fabric in place so I don't think it is the fibre switches. My hunch at this point is a storage problem.

I am going to set up the vMA and see if I can pull anything more useful.

Thanks for the suggestions and interest!

0 Kudos
MNKrantz
Enthusiast
Enthusiast

Turned out this issue was being caused by an issue with the firmware on the SAN. A firmware update resolved the issue.

0 Kudos