VMware Cloud Community
jftwp
Enthusiast
Enthusiast
Jump to solution

HA and 2nd service console

I've had just a single service console in each of my clusters for too long now. I want to setup a 2nd / backup service console, solely for improving resiliency/accuracy of HA. HA is configured on my clusters right now, with 'Leave VM powered on' since I'm paranoid about false positives. It happened twice in as many years.

Primary/existing service console is on vSwitch0 (typical of course) whose NICs access 10.25.13.0 network.

Isn't it best to add a secondary service console/gateway on another network than the primary? If so, vSwitch1 (in use by my VMs) accesses network 10.25.20.0 so can't I just 'Add Networking' / Service Console / Use vSwitch1 and add IP/gateway, etc. and be done with it? Again, I only want this for another path for HA heartbeats and don't have (or really even need) a dedicated NIC for just this purpose.

Sound good? Any caveats to share with this approach?

Then I can set the HA settings back to doing what HA is REALLY supposed to do---detect what is almost certainly a host FAILURE, and restart VMs on other hosts in the cluster. I find it 'interesting' that VMware added a 'Leave VMs powered on' option for host isolation response, because doesn't that more or less defeat the purpose and/or isn't that an admission that HA is prone to false positives that give undesirable results? Secondary service consoles, taking alternate paths (physically, nic to switch) AND to alternate subnets would seem to be an absolute requirement to the true/ideal 'Shut down VM' (and restart on alt hosts) implementation of HA.

Reply
0 Kudos
1 Solution

Accepted Solutions
jbogardus
Hot Shot
Hot Shot
Jump to solution

I think what is meant by the tip to keep the network label the same is to name the second service console the same on each of the hosts in the cluster, not to name it the same as the first service console.

Think of the second service console to provide primarily redundant heatbeat connectivity between the hosts, not really to provide another interface for checking isolation addresses after heartbeats are lost. Use of the second service console if placed on proper redundant network links & switches should ensure that when a failure is detected it is a true failure of another host and the additional check of isolation on a single interface would be enough to determine that the local host isn't the issue.

View solution in original post

Reply
0 Kudos
5 Replies
marcelo_soares
Champion
Champion
Jump to solution

Just adding another SC at vSwitch1 on this second network (10.25.20.0) will do the job. You only can't provide another gateway for this SC port, as it can have only 1 default gateway (which will be used as the isolation address by default). Just a tip: make your SC ports have exactly the same name (case sensitive).

I think that HA are still claiming for this secont SC port at the cluster summary tab, don't it?

Anyways, you will have many less false positives with this configuration.

Marcelo Soares

VMWare Certified Professional 310

Technical Support Engineer

Linux Server Senior Administrator

Marcelo Soares
jftwp
Enthusiast
Enthusiast
Jump to solution

So, the 'network label' needs to be the exact same as the first, so they're both called 'Service Console?' By default, the networking wizard wants to call it 'Service Console 2' The first one is called, simply 'Service Console'. Why do they have to have exactly the same name and/or why do you call that a 'tip'?

Also, I think I see what you mean about the 'Service Console' (original, on the original network) and the isolation address. But even though I am adding an SC to the other vSwitch that has access to the other network (hence different gateway involved, hence different isolation address for the backup/second SC), do I specify that second isolation address somewhere else, while leaving the gateway setting (when adding secondary SC) UNchanged from the original SC's network? I'm searching a bit now for best practices, adding another SC, etc. Any details appreciated.

I don't know what you mean about 'HA are still claiming for this second SC port at the cluster summary tab'...

Reply
0 Kudos
marcelo_soares
Champion
Champion
Jump to solution

Smiley Happy

Nope, I didn't explained me well. About the second Service Console port: all "seconds" SC ports must have the same name - e.g. in all your ESX you will have a "Service Console" attached to one network and the "Service Console 2" attached to another network. HA checks in all ESXs if the SC ports have the same name and are attached to the same networks.

About the multiple isolation addresses, you can read this: http://kb.vmware.com/kb/1002117

About the "HA claiming", if you click on your cluster, and then go to the "summary" tab, you probably will see a message saying that you "do not have a secondary SC port etc etc". But I don't remember if this occurs always or not.

Marcelo Soares

VMWare Certified Professional 310

Technical Support Engineer

Linux Server Senior Administrator

Marcelo Soares
jbogardus
Hot Shot
Hot Shot
Jump to solution

I think what is meant by the tip to keep the network label the same is to name the second service console the same on each of the hosts in the cluster, not to name it the same as the first service console.

Think of the second service console to provide primarily redundant heatbeat connectivity between the hosts, not really to provide another interface for checking isolation addresses after heartbeats are lost. Use of the second service console if placed on proper redundant network links & switches should ensure that when a failure is detected it is a true failure of another host and the additional check of isolation on a single interface would be enough to determine that the local host isn't the issue.

Reply
0 Kudos
jftwp
Enthusiast
Enthusiast
Jump to solution

Okay, thanks mainly to your feedback, and partly to the 'VMware HA Best Practices' pdf I just read, I think I have it straight now. I will:

1. Using vSwitch1, add a second service console and leave as default name of 'Service Console 2'. Assign its own IP address which will be used later for secondary isolation address.

2. Go into HA Advanced Options and add 'das.isolationaddress2' with value being the secondary isolation address. Also change the default timeout value from 15 seconds to 30 seconds by adding the 'das.failuredetectiontime = 30000

3. Set default isolation response on the cluster to 'Shut down VM'.

4. Apply all settings. Click OK. Wait for configuration to complete.

5. DISable HA on the cluster, wait for completion... and RE-enable HA on the cluster -


so all above settings truly take effect.

That oughta do it. Trying in my dev cluster later today or tomorrow, pending any further feedback. Thank you.

Reply
0 Kudos