Hi all,
I am unable to get HA restarted since I patched my ESX 3.02 machines today. I am getting an error "getshortnamefailed:cmd remove failed:hostname -s fails hostname: Unknown host".
I have verified that DNS and all hosts files are correct. What should I check next?
Thanks!
HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com
That looks like it could be part of your problem if not all of it.
I've had many HA problems, I found that nearly all HA problems are due to DNS problems, hostname and network problems.
1. Make sure your networking is sound (used to work so should be ok) Check if VCenter is pingable and all hosts see each other.
2. Check out your hosts hostnames. (use hosts file if nessecary). No Duplicate names.
3. Check out your DNS tables (short and long names). Als VCenter must be resolvable. Flush all DNS tables.
4. Turn off HA and test VMotion. HA is based on VMotion technology.
5. Then if you have it all set up properly then turn off HA and re-enable it. Hopefully it will get back on track.
What you may want to try is;
From your Virtual Center, click on your cluster to bring up the HA and DRS settings. Uncheck both HA and DRS Settings. Click OK. Now go back into your settings and check both boxes and click OK. This should fix your problem. If it dosen't then, do the same but remove the cluster and build a new one. Thats the only way I have figured how to fix that problem. I had it once.
Hope that helped.
The Following should help resolve this:
From each of your esx hosts check.
1) cat /etc/hosts
Make sure no CAPITAL letters are in the hosts file. This should also include the short name
2) cat /etc/sysconfig/network
Again make sure FQN is listed and no capital's
If Caps or incorrect host name enter
hostname fullhost.name.com
3) Check for consistancy with
hostname -s
hostname -i
hostname
Also do a Ping -a from the VC server and make sure the hostname resolves correctly.
Once this has been checked and all Caps Corrected, Uncheck HA at the Cluster, Let HA un-configure and then re-enable HA again.
Cheers!
I am still having a problem after I have tried all the rcommendations thus far. I do believe this is hostname related but I am not sure where to go. See below :
hostname: Unknown host
NETWORKING=yes
HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com
GATEWAY=10.19.49.1
GATEWAYDEV=vswif0
My actual hostname fqdn is rsfpesx2.es.ad.adp.com.
The contents of my hosts file on all my esx servers is:
login as: root
root@rsfpesx2's password:
Last login: Fri Oct 26 09:02:46 2007 from rslvxpwsmuddk.nj.adp.com
hostname: Unknown host
NETWORKING=yes
HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com
GATEWAY=10.19.49.1
GATEWAYDEV=vswif0
-bash: cd: etc: No such file or directory
Do not remove the following line, or various programs
that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
10.24.18.52 rslvesx1.es.ad.adp.com rslvesx1
10.24.18.228 rslvesx2.es.ad.adp.com rslvesx2
10.19.49.192 rsfpesx1.es.ad.adp.com rsfpesx1
10.19.49.197 rsfpesx2.es.ad.adp.com rsfpesx2
I can ping all of these by shortname or fqdn including VC server.
I have removed both hosts from the cluster and recreated the cluster.
I am sure this is something simple but I am still overlooking it apparently!!!
HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com
That looks like it could be part of your problem if not all of it.
I agree but I am not sure where this is coming from???
Try flushing the FT_HOSTS file it has the HA partners DNS located in it.
Look in these files on all hosts and check for problems.
Try rebuilding the cluster 1 host at a time and see if you get the problem with a specific host.
I am sorry to be so ignorant but how would I go about flushing FT_HOSTS?
Make a backup of the file and then remove all entries from it.
FT_HOSTS will be rebuilt when setting up HA. FT_HOSTS contains the host names of all HA partners if i'm not mistaken.
Google around for info on FT_HOSTS.
I had the same issues. Ended up that virtual center wasn't in DNS. Added virtual center and reconfigured each host and everything has been fine.
You might be able to find some things here...
cat /opt/LGTOaam512/log/aam_config_util_addnode.log |more
cat /var/log/vmware/vpx/vpxa.log |more
/usr/bin/perl /opt/LGTOaam512/vmware/aam_config_util.pl -z -cmd=listnodes -domain=vmware
Please do this in this EXACT ORDER
edit network file to correct your hostname
nano /etc/sysconfig/network
Set host name with the True fqn name of your host
hostname full.hostname.com
Check for consistency with
hostname -i
hostname -s
hostname
Now remove the ESX host from VC and re-add the host back again by this CORRECT fqn name. If nessesary edit your VC windows host file