VMware Cloud Community
BobbyJ
Contributor
Contributor
Jump to solution

Clarification on HA

Just a quick question for clarification on how HA works...

If I have 3 hosts, host1,host2, and host3, in a DRS-HA cluster, what would cause the VMs on the hosts fail over? Is it just when an entire Host fails that HA kicks in and restarts the VMs on the other Hosts in the cluster? Would a failure in a Nic or in a fibre hba on a host cause a fail over in VMs?

So far in my testing a failed Nic and failed HBA on a host, doesn't cause failover of the VMs to a different host. Just looking for confirmation that all is working appropriately.

Thanks

0 Kudos
1 Solution

Accepted Solutions
oreeh
Immortal
Immortal
Jump to solution

A failover only occurs if the ESX host fails. To be precise if the ESX host isn't reachable from the other hosts in the cluster.

View solution in original post

0 Kudos
11 Replies
oreeh
Immortal
Immortal
Jump to solution

A failover only occurs if the ESX host fails. To be precise if the ESX host isn't reachable from the other hosts in the cluster.

0 Kudos
MR-T
Immortal
Immortal
Jump to solution

Each node within the HA cluster communicates over the service console network, so if that goes down regardless of the other network cards, the host will be considered down and HA will fail the VM's across.

0 Kudos
ilatimer
Hot Shot
Hot Shot
Jump to solution

There is a new option in VI3.5 and VC2.5 that is referred to as HA for VMs. It will monitor the VM and if it looses the heartbeat from the VM it will automatically be restarted. This would be like if the VM OS hung or blue screened then it would basically be reset automatically.

0 Kudos
oreeh
Immortal
Immortal
Jump to solution

There are still improvements required in case of host failures.

Ever had a host with failed HBAs?

0 Kudos
BobbyJ
Contributor
Contributor
Jump to solution

I haven't had any hba's fail in the couple of years I've used esx technology... but I can assume it happens...

What are your recommendations?

Thanks.

0 Kudos
oreeh
Immortal
Immortal
Jump to solution

There are no recommendations besides a solid monitoring.

If the HBAs fail and your ESX host still is running (which usually is the case) HA simply doesn't help Smiley Sad

0 Kudos
MR-T
Immortal
Immortal
Jump to solution

Any design worth its salt will always factor redundancy into both storage and network configurations.

You should have multiple paths for storage and where possible do the same with the network configuration.

Make sure power is dual where possible,

Cover all bases and you're in a better place.

0 Kudos
oreeh
Immortal
Immortal
Jump to solution

Unfortunately complete redundancy is too expensive for most environments and doesn't always help.

A while back I had an issue with a failed SAN storage controller - instead of failing over to the redundant path the SAN simply "went nuts".

0 Kudos
MR-T
Immortal
Immortal
Jump to solution

I guess it depends who you're working with, but I'd always push for redundancy where it can be afforded.

You can only plan & test these things. When something odd happens during a live failure, you know you've done your best.

0 Kudos
oreeh
Immortal
Immortal
Jump to solution

you know you've done your best.

:smileygrin:

0 Kudos
Rodos
Expert
Expert
Jump to solution

Bobby others have covered things. VMware HA monitors network connectivity and will restart the VMs on another host after 15 seconds. Always try and add some redundancy to the network. HA does not cover or detect FC HBA failure, which is sometimes a surprise to people.

You mentioned you did some testing and virtual machines did not fail over. Did the VM's have their isolation response changed from the default? You can change it to not restart on an isolation response (split brain or network loss) but they will restart if the whole ESX server goes down.

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}
0 Kudos