We have just loaded 5 VI3 servers. All settings are the same. But 2 of the servers do not want / can not configure the HA Agent.
One server complains that it "Fails to communicate with remote host"
While the other server simply states that "An Error occurred during configuration"
Any suggests would be welcomed.
Thanks,
Golden
hmm ok, I had the same problem today with 4 esx servers. The issue was they couldnt ping the default gateway.
Check this link out:
http://www.vmware.com/community/thread.jspa?messageID=510015��
Add the following to your HA advanced field to point to your virtual center server.
das.isolationaddress 10.128.160.156
Oh and this resolved my problem.
Message was edited by:
Mooihoek
This is generally a name resolution problem. I would create a proper host file that contains both the FQDN as well as the short name for each ESX host and put that in your /etc directory.
Should that not only be necessary if you do not have DNS working. Which does appear to be working fine from host to host.
There are limitations to FQDN length - don't have it handy, but I believe if you are over 29 characters you'll need to make changes.
Also, have you verified IP settings? Maybe a subnet mismatch?
FQDN:
########.###.########.##
Double checked IPs are good
Are there any more clues in HA log files? In the service console, you should find a set of logfiles in the directory /opt/LGTOaam512/
If you find there are hostname lookup failures, you know its something to do with DNS. Otherwise it could be something like the service console not having a default gateway.
I normally check /etc/hosts, /etc/resolv.conf and /etc/sysconfig/network to check all is ok.
In particular, even though I use DNS, I ensure there is an entry for the service console in the hosts file both as alias and FQDN; e.g.
1.2.3.4 esx1 esx1.domain.com
Once all good, right click on the esx host in VI Client and choose reconfigure HA.
It would be great to know if the aam logfiles help,
good luck!
Al
Well I have managed to get one of the two hosts connected.
First I tried moving them all to a new Cluster but the issue happened again. This time the server was different that said "An Error occurred during configuration"
While the other server continues to have the same issue.
I have 5 servers total. When the 4th server tries to be added it fails with the above. Then the last server with "Fails to communicate with remote host"
I got the 4th server to start ok by putting one of the working servers in maintenance mode. Once the 4th server was in I took the other server out of maintenance mode.
As for the last server I am rebuilding and will let you know.
Golden
Make sure all your ESX servers are all able to ping the default gateway.
Ping is disabled on the Default gateway.
I'm pretty sure the HA component in service console needs to be able to ping the gateway during configuration.
It uses this to check if the server has become isolated.
Not sure why some of the servers are working and others not. Are all the servers in the cluster on the same subnet and using the same gateway?
Is there a possibility of enabling ICMP echo reply on the gateway?
Al
hmm ok, I had the same problem today with 4 esx servers. The issue was they couldnt ping the default gateway.
Check this link out:
http://www.vmware.com/community/thread.jspa?messageID=510015��
Add the following to your HA advanced field to point to your virtual center server.
das.isolationaddress 10.128.160.156
Oh and this resolved my problem.
Message was edited by:
Mooihoek
Al I think you are almost certainly correct with this. I beleive the console will ping the default gateway to see if the network is up
That sounds like an excellent idea.
I'm guessing that parameter instructs the AAM component to check the IP address specified instead of the GW?
So as long as you can get an ICMP echo reply from the VC server (or whatever you point the isolation address to) then AAM will be happy?
Great find.
Al
Yes this will def work spent enough time today looking at it!
Can your esx servers ping each other with short name and fqdn?
Can you post the /etc/hosts of your esx servers.
It should have the following entry's
192.168.0.1 exx1.example.local esx1
Sometimes the shortname like the example above esx1 is missing.
Check if your default gateway is oke and is pingable.
Check dns, make sure the forward zone and the reverse zone are ok.
If above is all correct.
Dot the following:
Put the esx servers in maintenance mode and remove them from the cluster after that recreate the cluster, that will reinstall the legato client.
Keep us updated how it goes.
Good luck!
So it would appear that we had two issues. The first issue was an IP confilct.
It looks like the second issue was resolved with the das.isolationaddress setting. Not sure why some were fine and others were not....
Thanks so much for all the help!!!
Golden
Hello
I am having the same problem but found something else interesting.
If HA is looking for isolation with PINGing default GW (SC or that in VC), why does it not thik that is isolated if I disable GW? If I do this everything works normaly , VM are up and running. Only when I disconect also SC sthernet cable isolation accures.
So, if I have GW and then SC nic down both ESX hosts in cluster power off VM (default setting).
But if I only bring SC nic down and leave GW up, then HA power on all VM that were primary on host with no SC nic now on another esx host.
In this case HA dosnt check just PING on GW for isolation but also SC nic status.
anybody had the same problem, any ideas?
Best Regards
The gateway IP (or isolation address) is only used if the HA component believes it has become isolated.
The PING of the isolation address is only used to determine if the lack of contact from the other HA components in the other cluster nodes is this ESX servers problem, i.e. if the service console can't get to it's own gateway, then we are assuming big network problems and therefore know it is isolation and not a loss of the other ESX hosts.
So, the GW or isolation address is only used to verify that the host is indeed isolated. If the HA component in the service console is receiving heartbeats ok from other cluster nodes, then the prescence or absence of the gateway is not considered.
Al
Try to add an entry in virtual center server's machine host file.
Windows host file
10.0.0.1 ESX1.CORP.COM ESX1
10.0.0.2 ESX2.CORP.COM ESX2
10.0.0.3 ESX3.CORP.COM ESX3
10.0.0.4 ESX4.CORP.COM ESX4
10.0.0.5 ESX5.CORP.COM ESX5