Solved: Clarification on HA

BobbyJ · ‎01-09-2008

Just a quick question for clarification on how HA works...

If I have 3 hosts, host1,host2, and host3, in a DRS-HA cluster, what would cause the VMs on the hosts fail over? Is it just when an entire Host fails that HA kicks in and restarts the VMs on the other Hosts in the cluster? Would a failure in a Nic or in a fibre hba on a host cause a fail over in VMs?

So far in my testing a failed Nic and failed HBA on a host, doesn't cause failover of the VMs to a different host. Just looking for confirmation that all is working appropriately.

Thanks

oreeh · ‎01-09-2008

A failover only occurs if the ESX host fails. To be precise if the ESX host isn't reachable from the other hosts in the cluster.

View solution in original post

oreeh · ‎01-09-2008

A failover only occurs if the ESX host fails. To be precise if the ESX host isn't reachable from the other hosts in the cluster.

MR-T · ‎01-09-2008

Each node within the HA cluster communicates over the service console network, so if that goes down regardless of the other network cards, the host will be considered down and HA will fail the VM's across.

ilatimer · ‎01-09-2008

There is a new option in VI3.5 and VC2.5 that is referred to as HA for VMs. It will monitor the VM and if it looses the heartbeat from the VM it will automatically be restarted. This would be like if the VM OS hung or blue screened then it would basically be reset automatically.

oreeh · ‎01-09-2008

There are still improvements required in case of host failures.

Ever had a host with failed HBAs?

BobbyJ · ‎01-09-2008

I haven't had any hba's fail in the couple of years I've used esx technology... but I can assume it happens...

What are your recommendations?

Thanks.

oreeh · ‎01-09-2008

There are no recommendations besides a solid monitoring.

If the HBAs fail and your ESX host still is running (which usually is the case) HA simply doesn't help

MR-T · ‎01-09-2008

Any design worth its salt will always factor redundancy into both storage and network configurations.

You should have multiple paths for storage and where possible do the same with the network configuration.

Make sure power is dual where possible,

Cover all bases and you're in a better place.

oreeh · ‎01-09-2008

Unfortunately complete redundancy is too expensive for most environments and doesn't always help.

A while back I had an issue with a failed SAN storage controller - instead of failing over to the redundant path the SAN simply "went nuts".

MR-T · ‎01-09-2008

I guess it depends who you're working with, but I'd always push for redundancy where it can be afforded.

You can only plan & test these things. When something odd happens during a live failure, you know you've done your best.

oreeh · ‎01-09-2008

you know you've done your best.

:smileygrin:

Rodos · ‎01-10-2008

Bobby others have covered things. VMware HA monitors network connectivity and will restart the VMs on another host after 15 seconds. Always try and add some redundancy to the network. HA does not cover or detect FC HBA failure, which is sometimes a surprise to people.

You mentioned you did some testing and virtual machines did not fail over. Did the VM's have their isolation response changed from the default? You can change it to not restart on an isolation response (split brain or network loss) but they will restart if the whole ESX server goes down.

Rodos {size:10px}{color:gray}Consider the use of the helpful or correct buttons to award points. Blog: http://rodos.haywood.org/{color}{size}

All

Clarification on HA