VMware Cloud Community
hattster
Contributor
Contributor

How to configure HA or FT to detect iSCSI network failures

Hey Guys,

I have been hunting over numerous threads and blogs trying to find an answer to this question but cannot seem to find an answer, so hopefully somebody here will be able help me out with this.

Basically my scenario is I have 2 ESX hosts in seperate buildings, building A and building B. Each host is directly connected to a iSCSI switch and those iSCSI switces are directly connected with a fiber link in between buildings. The SAN is in building A and is only directed connected to the iSCSI switch in building A.

The issue I am having is if the iSCSI switch in Building B goes down, the VM's do not failover to the fully functioning host in Building A, they simply become unresponsive (obvisouly) and just hang until the iSCSI communication is restored. I have been testing this by pulling the cables from the iSCSI switch in building B.

Any suggestions are greatly appreciated and if you need any additional information please let me know.

Thanks,

0 Kudos
11 Replies
athlon_crazy
Virtuoso
Virtuoso

Make sense because HA will only happen either when there is a host failure or when isolation network occur and not because of storage(iscsi) failure.

http://www.no-x.org
0 Kudos
hattster
Contributor
Contributor

I realise that is how it works but is there not a way to have it detect this and failover, maybe by creating an event/task for thwn both NICS go down..

0 Kudos
athlon_crazy
Virtuoso
Virtuoso

Although if you can succeed, how this is possible (failover) when VM B, in building B, in iscsi B but cannot be seen by Host A in building A? How possible Host A can bring up VM B when it cannot access VM B file?

http://www.no-x.org
0 Kudos
hattster
Contributor
Contributor

If you are referring to the lock file on the VM, does this not timeout after a few minutes? Unless you're referring to the actual access to the SAN but this will be possible because HOST A would still have communication with the SAN however HOST B would not.

0 Kudos
athlon_crazy
Virtuoso
Virtuoso

If you have vCenter, you can play with VMware HA "VM monitoring" setting. This setting normally with monitor individual VM heartbeat. If no heartbeat received within specific of time, the VM will be restarted.

http://www.no-x.org
0 Kudos
hattster
Contributor
Contributor

Would it be possible to have the vCenter heartbeat go over the iSCSI network, and if that heartbeat is lost the host assumes it lost communication?

0 Kudos
hattster
Contributor
Contributor

Alright just a bit more information, I have performed some additional testing and it looks like when the VM is in FT and I pull the iSCSI connection it will instantly failover however I still cannot get this working with simply HA. I assume since the host can detect this failure with FT enabled it should be smart enough to detect this failure with HA enabled and restart the VM's on the other host.

Thanks,

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

I would think HA would start it up on HostA if the lock on the VM times out which is another thing HA Checks for besides the Network and CPU activity to see if it can see if the VM is still running. What settings do you have setup for HA? Do you have Host Monitoring turned on? What VM Monitoring Level do you have Enabled? VM Monitoring Only? and what is the Monitoring sensitivity that you have set? Low, Medium, High? Also, do you have the VM set to shutdown, powerdown or stay running?

0 Kudos
hattster
Contributor
Contributor

My HA settings are as follows:

Enable Host Monitoring

Admission Control: Enabled

Admission Control Policy: Percentage 25%

VM Restart Priority: High

Host Isolation Response: Shut down

VM monitoring: Disabled

0 Kudos
jcwuerfl
Hot Shot
Hot Shot

Have you tried switching the VM Monitoring to VM Monitoring Only and setting the Monitoring sensitivity to High?

0 Kudos
hattster
Contributor
Contributor

Yes and I have seen no change.

0 Kudos