VMware Cloud Community
Sticky
Enthusiast
Enthusiast
Jump to solution

Having issues with HA Cluster

We have just loaded 5 VI3 servers. All settings are the same. But 2 of the servers do not want / can not configure the HA Agent.

One server complains that it "Fails to communicate with remote host"

While the other server simply states that "An Error occurred during configuration"

Any suggests would be welcomed.

Thanks,

Golden

0 Kudos
1 Solution

Accepted Solutions
virtech
Expert
Expert
Jump to solution

hmm ok, I had the same problem today with 4 esx servers. The issue was they couldnt ping the default gateway.

Check this link out:

http://www.vmware.com/community/thread.jspa?messageID=510015��

Add the following to your HA advanced field to point to your virtual center server.

das.isolationaddress 10.128.160.156

Oh and this resolved my problem.

Message was edited by:

Mooihoek

View solution in original post

0 Kudos
24 Replies
EnsignA
Hot Shot
Hot Shot
Jump to solution

This is generally a name resolution problem. I would create a proper host file that contains both the FQDN as well as the short name for each ESX host and put that in your /etc directory.

0 Kudos
Sticky
Enthusiast
Enthusiast
Jump to solution

Should that not only be necessary if you do not have DNS working. Which does appear to be working fine from host to host.

0 Kudos
lurpy1
Contributor
Contributor
Jump to solution

There are limitations to FQDN length - don't have it handy, but I believe if you are over 29 characters you'll need to make changes.

Also, have you verified IP settings? Maybe a subnet mismatch?

0 Kudos
Sticky
Enthusiast
Enthusiast
Jump to solution

FQDN:

########.###.########.##

Double checked IPs are good

0 Kudos
AlistairS
Hot Shot
Hot Shot
Jump to solution

Are there any more clues in HA log files? In the service console, you should find a set of logfiles in the directory /opt/LGTOaam512/

If you find there are hostname lookup failures, you know its something to do with DNS. Otherwise it could be something like the service console not having a default gateway.

I normally check /etc/hosts, /etc/resolv.conf and /etc/sysconfig/network to check all is ok.

In particular, even though I use DNS, I ensure there is an entry for the service console in the hosts file both as alias and FQDN; e.g.

1.2.3.4 esx1 esx1.domain.com

Once all good, right click on the esx host in VI Client and choose reconfigure HA.

It would be great to know if the aam logfiles help,

good luck!

Al

Sticky
Enthusiast
Enthusiast
Jump to solution

Well I have managed to get one of the two hosts connected.

First I tried moving them all to a new Cluster but the issue happened again. This time the server was different that said "An Error occurred during configuration"

While the other server continues to have the same issue.

I have 5 servers total. When the 4th server tries to be added it fails with the above. Then the last server with "Fails to communicate with remote host"

I got the 4th server to start ok by putting one of the working servers in maintenance mode. Once the 4th server was in I took the other server out of maintenance mode.

As for the last server I am rebuilding and will let you know.

Golden

0 Kudos
virtech
Expert
Expert
Jump to solution

Make sure all your ESX servers are all able to ping the default gateway.

Sticky
Enthusiast
Enthusiast
Jump to solution

Ping is disabled on the Default gateway.

0 Kudos
AlistairS
Hot Shot
Hot Shot
Jump to solution

I'm pretty sure the HA component in service console needs to be able to ping the gateway during configuration.

It uses this to check if the server has become isolated.

Not sure why some of the servers are working and others not. Are all the servers in the cluster on the same subnet and using the same gateway?

Is there a possibility of enabling ICMP echo reply on the gateway?

Al

0 Kudos
virtech
Expert
Expert
Jump to solution

hmm ok, I had the same problem today with 4 esx servers. The issue was they couldnt ping the default gateway.

Check this link out:

http://www.vmware.com/community/thread.jspa?messageID=510015��

Add the following to your HA advanced field to point to your virtual center server.

das.isolationaddress 10.128.160.156

Oh and this resolved my problem.

Message was edited by:

Mooihoek

0 Kudos
virtech
Expert
Expert
Jump to solution

Al I think you are almost certainly correct with this. I beleive the console will ping the default gateway to see if the network is up

0 Kudos
AlistairS
Hot Shot
Hot Shot
Jump to solution

That sounds like an excellent idea.

I'm guessing that parameter instructs the AAM component to check the IP address specified instead of the GW?

So as long as you can get an ICMP echo reply from the VC server (or whatever you point the isolation address to) then AAM will be happy?

Great find.

Al

0 Kudos
virtech
Expert
Expert
Jump to solution

Yes this will def work spent enough time today looking at it!

0 Kudos
Henriwithani
Contributor
Contributor
Jump to solution

Can your esx servers ping each other with short name and fqdn?

-Henri Twitter: http://twitter.com/henriwithani Blog: http://henriwithani.wordpress.com/
0 Kudos
RobMokkink
Expert
Expert
Jump to solution

Can you post the /etc/hosts of your esx servers.

It should have the following entry's

192.168.0.1 exx1.example.local esx1

Sometimes the shortname like the example above esx1 is missing.

Check if your default gateway is oke and is pingable.

Check dns, make sure the forward zone and the reverse zone are ok.

If above is all correct.

Dot the following:

Put the esx servers in maintenance mode and remove them from the cluster after that recreate the cluster, that will reinstall the legato client.

Keep us updated how it goes.

Good luck!

0 Kudos
Sticky
Enthusiast
Enthusiast
Jump to solution

So it would appear that we had two issues. The first issue was an IP confilct.

It looks like the second issue was resolved with the das.isolationaddress setting. Not sure why some were fine and others were not....

Thanks so much for all the help!!!

Golden

0 Kudos
kastro
Enthusiast
Enthusiast
Jump to solution

Hello

I am having the same problem but found something else interesting.

If HA is looking for isolation with PINGing default GW (SC or that in VC), why does it not thik that is isolated if I disable GW? If I do this everything works normaly , VM are up and running. Only when I disconect also SC sthernet cable isolation accures.

So, if I have GW and then SC nic down both ESX hosts in cluster power off VM (default setting).

But if I only bring SC nic down and leave GW up, then HA power on all VM that were primary on host with no SC nic now on another esx host.

In this case HA dosnt check just PING on GW for isolation but also SC nic status.

anybody had the same problem, any ideas?

Best Regards

0 Kudos
AlistairS
Hot Shot
Hot Shot
Jump to solution

The gateway IP (or isolation address) is only used if the HA component believes it has become isolated.

The PING of the isolation address is only used to determine if the lack of contact from the other HA components in the other cluster nodes is this ESX servers problem, i.e. if the service console can't get to it's own gateway, then we are assuming big network problems and therefore know it is isolation and not a loss of the other ESX hosts.

So, the GW or isolation address is only used to verify that the host is indeed isolated. If the HA component in the service console is receiving heartbeats ok from other cluster nodes, then the prescence or absence of the gateway is not considered.

Al

0 Kudos
pedromm
Contributor
Contributor
Jump to solution

Try to add an entry in virtual center server's machine host file.

Windows host file

10.0.0.1 ESX1.CORP.COM ESX1

10.0.0.2 ESX2.CORP.COM ESX2

10.0.0.3 ESX3.CORP.COM ESX3

10.0.0.4 ESX4.CORP.COM ESX4

10.0.0.5 ESX5.CORP.COM ESX5

0 Kudos