VMware Global Community
toinhostarr
Contributor
Contributor

Failures with iSCSI datastores | vSphere 5.5

Hello everybody.

We have a vSphere 5.5 enviroment with 6 hosts (DELL PowerEdge 610) and a iSCSI storage with 14 datastores (DELL PowerVault MD3600i). This week, for the 3rd time, one of our servers stopped communicating with some datastores. The virtual machines continued to respond ICMP and SNMP services, but i wasn't able to connect to them on console or remote desktop.

One hour after that, another host seemed to have the same issue. We can't ever shut down the vistual machines. The event logs of the hosts showed lots of messages of lost/restore access to the datastores.

The only way to get the things back to operation is to turn down all hosts, switches iSCSI and the storage.As I said before, that's the 3rd time we have this behavior. It happen last year, a month ago, and last friday.

Ou iSCSI storage is using a pair of 10Gb switches, and the datstores are configured to "Most recently used". I have collected the logs from all hosts, if it helps.

Thanks a lot.

--

Antonio Augusto

0 Kudos
2 Replies
rcporto
Leadership
Leadership

How is the overload on the storage? Did you noticed any failover of storage controllers or any other event on the storage, like RAID rebuild? And the iSCSI switches are dedicated just for iSCSI traffic or there are other type of traffic sharing the same switch? About the fact of virtual machine still reply to ICMP and SNMP is because it is already running in memory, but any other process that relays on disk access will not work, since the host lost access to the storage volumes. Last but not least, check if your storage firmware is up-to-date.

---

Richardson Porto
Senior Infrastructure Specialist
LinkedIn: http://linkedin.com/in/richardsonporto
0 Kudos
SureshKumarMuth
Commander
Commander

Since all the hosts are affected , does not look like an issue with ESXi hosts.

Could be an intermittent network issue or complete storage might be unresponding.

1. Did you check for errors in storage box events/logs? Engage storage team to check for errors in storage.

2. How did you recover the servers, did you reboot the ESXi hosts ?

3. are there any changes in the network before the issue occurred ?

Regards,
Suresh
https://vconnectit.wordpress.com/
0 Kudos