HA isolationaddress - different default gateway on...

MKguy · ‎10-27-2008

Greetings gentlemen,

I have a

- HA-Cluster with 2 hosts running ESX 3.5 U2 and VC 2.5 U3

- with one teamed SC Interface on the same subnet for both hosts

- and there are 2 routers in this SC subnet and each hosts uses a different one as default gateway (don't ask why... )

Which router would now be my default isolation address? The router of the primary HA node or the respective default gateway of each host?

If I want to configure the hosts to use both routers as das.isolationaddress, should I set das.usedefaultisolationaddress to false and set das.isolationaddress1=router1 and das.isolationaddress2=router2 on the cluster?

What happens if each host uses it's respective default gateway as the default isolationadress and I configure das.isolationaddress1=router1 and das.isolationaddress2=router2? Will each host check a router address twice in this case?

Enlightenment is appreciated.

-- http://alpacapowered.wordpress.com

depping · ‎10-27-2008

I would just leave it as it is at the moment. HA doesn't care if both gw addresses are different. It just uses it in case of an isolation. You could indeed set das.usedefaultisolationaddress to false and enter both gw's in an das.iso address... but it wouldn't add much i guess, well an additional check to see if it's actually isolated or not wouldn't hurt... This would mean though, in my opinion, that you would have to enlarge the isolation response value.... especially if you've only got 1 SC on 2 nics, I would suggest at least 60 seconds.

Duncan

Blogging: http://www.yellow-bricks.com

If you find this information useful, please award points for "correct" or "helpful".

SCampbell1 · ‎10-27-2008

The purpose of the isolation address stuff is to decide whether I've lost touch with the host because the host has crashed (I want to restart the VMs) or because the network has crashed (I don't want to restart the VMs).

If you have different hosts pinging different IPs you may run the (small in your case) risk of detecting the wrong condition.

I think you are better to have all hosts determining the network is functional from a consistent set of sources. I would consider configuring both gw1 and gw2 in the same order as das.isolationAddress for all ESX servers, or at least configuring the same address for both servers.

This is taken from the Resource Guide

By default, the gateway IP address specified in each ESX Server host service console
network configuration is used as the isolation address. Each service console network
must have one isolation address it can reach. When you set up service console
redundancy, you must specify an additional host isolation response address
(das.isolationaddress2) for the secondary service console network. This isolation
address should have as few network hops as possible. When you specify a secondary
isolation address, VMware recommends that you increase the
das.failuredetectiontime setting to 20000 milliseconds or greater. See “Setting
Advanced HA Options” on page 126.

MKguy · ‎10-28-2008

I guess I'll go with setting das.usedefaultisolationaddress to false and set das.isolationaddress1=router1 and das.isolationaddress2=router2, because I'd like to have a consistent

HA configuration for both hosts. By the way, I'm using leave VMs powered on as isolation response, so even if I screw up the isolationadress stuff it shouldn't make much of a difference. Of course this is arguable, but I can't help it since many don't want their systems to be shutdowned by ESX only because a host thinks it is isolated on a ESXhosts-only a network point of view...

Furthermore, automatic failback is disabled.

Pertaining the das.failuredetectiontime increase to 60 seconds:

It's indeed recommended in the HA best practices and resource management guide, but i really fail to grasp the meaning behind increasing it from the default 15 secs to 60 if you use a single teamed SC Interface.

Isn't the failover of the physical NICs transparent for the SC (and other) Interfaces on vSwitchs? So if one physical uplink vmnic dies, failover occurs within less than a second. Shouldn't the HA heartbeats from the SC be sent out by that interface as soon as the failover completed?

-- http://alpacapowered.wordpress.com

depping · ‎10-28-2008

Yes one would expect that. But this also means that your mac address will be known to a different port then the physical switch would expect, possibly packets would be dropped. In other words, I would just set it to 60 any way to be really sure it works. and test it! Unplug and plug! See if all vm's keep running or not.

Duncan

Blogging: http://www.yellow-bricks.com

If you find this information useful, please award points for "correct" or "helpful".

All

HA isolationaddress - different default gateway on HA nodes