VMware Cloud Community
EPL
Contributor
Contributor

Disconnected Hosts after storage outage

On several occasions we have been in a situation where our ESX hosts have become disconnected and unresponsive to Vcenter after storage related events. Our ESX hosts each boot from local storage, and all VM's are run off our iSCSI Equallogic SAN. I'm wondering if any one else has experienced this issue where if loss of connectivity to your iSCSI SAN has caused the actual host to disconnect and become unresponsive in vCenter? We could SSH into the boxes, but killing/starting/restarting services wouldn't fix the issue. The only way we were able to get them to come back was to do a reboot of the system through iLO.

Our Setup:

4 Equallogic PS5000 arrays

Dedicated Cisco 3750 switches for iSCSI traffic

5 HP DL380 G5&G6 servers

Each server is configured to use the MPIO configuration per the Yellow-Bricks article, and each volume is configured for Round Robin.

The first two times this happened, we had a problem with one of our 3750 switches in our stack. The last time it happened, we were doing upgrades on the Equallogic Firmware, we powered off all the VM's but left the hosts running. We did our Firmware upgrade, and when we looked at Virtual Center after we were done, each of our 5 hosts were in a disconnected state.

Our single ESX 3.5u4 had none of these issues.

Just wanted to ping everyone to see if anyone has seen something similar...

0 Kudos
7 Replies
srinivasvivek
Enthusiast
Enthusiast

Are your VMs getting reset too?

0 Kudos
EPL
Contributor
Contributor

Yes when the storage outage was unplanned, the vm's got hosed. The third time, we shut down all vm's but left the hosts on. The hosts still needed a hard reboot in order to bring them back.

0 Kudos
bulletprooffool
Champion
Champion

If you network was down and came back up, try just restarting the management agents, rather than a reboot

One day I will virtualise myself . . .
0 Kudos
EPL
Contributor
Contributor

as per my original post, I tried restarting/killing/starting the services to no avail. The only way I was able to get the hosts to come back to Virtual Center was to reboot them.

0 Kudos
bulletprooffool
Champion
Champion

Oops,

Were you able to ping the various interfaces? (ie, was the network up, but the service simply not responding), or did you lose all connectivity?

One day I will virtualise myself . . .
0 Kudos
EPL
Contributor
Contributor

yeah, I was able to ssh into the box and ping in and out with no issues. It just wouldn't reconnect to virtual center.... even after killing all services and trying to restart.

0 Kudos
rcustersp
Enthusiast
Enthusiast

Hey EPL,

Were you ever able to find a solution to this issue? We are having this happen at random, without any outages that we know of. Once the host shows as disconnected we can ssh into but cannot access the vm's and the host does not show as connected until the system is fully rebooted. This has been a support nightmare for us.

0 Kudos