I'm just curious about HW and how it works.
At the moment, I have 3 ESX 3.5 hosts and VC 2.5. Initially, the persons setting them up, out both Service Console and VM's on the same vswitch and on the same internal VLAN. I have now created more VLAN's, which are routed on our firewall (Checkpoint). HA was enabled and was working, but I disabled it when I was setting up new networks and assigning new IP adresses to the SC.
My question is now: what happens if our firewall (router) fails? I know that HA pings the default gateway, but doesn't it detect that the other ESX hosts are still running and does not force a restart of the VM's? I have setup switch redunacy and all virtual switches have dual or triple physical NIC's.
The gateway is only pinged to check if the host is isolated (= has no network connectivity), but this starts only when it stops receiving heartbeats from the other hosts in the cluster. As you have your service consoles probably on the same subnet, they would talk even with a dead GW.
Thanks for the help.
Yes, all my hosts are on the same subnet, so then I know I can survive a downed firewall.
It has a 15 seconds heartbeat and if it found to be isolated than it restart the VMs on other ESX hosts within your cluster. HA heavily depends on your DNS infrastructure and I would place entries in /etc/hosts file for all ESX hosts to make sure if your DNS failed it still communicate via host entries.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
iGeek Systems Inc.
VMware, Citrix, Microsoft Consultant
Best Practice for HA requires redundancy for the Service Console network, this can be accomplished one of two ways:
1. Single Service Console Network with redundant pNICs connected to different pSwitches
2. Secondary Service Console Network. You can create a second SC portgroup on a new or existing vSwitch, and then configure a second Isolation Address (under HA Advanced Options set: das.isolationaddress2 = SecondIPAddress )
Personally I like Option 1 better. Another advanced option you might want to consider setting is changing the default timeout value: das.failuredetectiontime = timeinms
I had changed this from the default of 15 sec to 60000 (60 seconds). This just gives you a little more time before HA thinks you have an isolated/down ESX server.
The other option is to change the default for Isolation Response from Power Off to Leave Powered On. This will make sure VMs do not get powered off for false HA Isolation events. This does mean if the server really is "isolated" the VMs wont be moved, but they should still be up and running, because we are not talking about an ESX server being down, just isolated from the rest of the cluster. IMO its better to leave the VMs running if they are still up and resolve this problem after business hours.
Don Pomeroy
VMware Communities User Moderator
" I would place entries in /etc/hosts file for all ESX hosts to make sure if your DNS failed it still communicate via host entries."
Here is the text from the doc:
1. Proper DNS & Network settings are needed for initial configuration
After configuration DNS resolutions are cached to /etc/FT_Hosts (minimizing the
dependency on DNS server availability during an actual failover)
DNS on each host is preferred (manual editing of /etc/hosts is error prone)
So what is everyone else doing?
This document was generated from the following thread: What happens if default gateway fails in HA?