gotts
Enthusiast
Enthusiast

HA issue after patching.

Jump to solution

Hi all,

I am unable to get HA restarted since I patched my ESX 3.02 machines today. I am getting an error "getshortnamefailed:cmd remove failed:hostname -s fails hostname: Unknown host".

I have verified that DNS and all hosts files are correct. What should I check next?

Thanks!

0 Kudos
1 Solution

Accepted Solutions
mstahl75
Virtuoso
Virtuoso

HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com

That looks like it could be part of your problem if not all of it.

View solution in original post

0 Kudos
11 Replies
WillemB
Enthusiast
Enthusiast

I've had many HA problems, I found that nearly all HA problems are due to DNS problems, hostname and network problems.

1. Make sure your networking is sound (used to work so should be ok) Check if VCenter is pingable and all hosts see each other.

2. Check out your hosts hostnames. (use hosts file if nessecary). No Duplicate names.

3. Check out your DNS tables (short and long names). Als VCenter must be resolvable. Flush all DNS tables.

4. Turn off HA and test VMotion. HA is based on VMotion technology.

5. Then if you have it all set up properly then turn off HA and re-enable it. Hopefully it will get back on track.

williamarrata
Expert
Expert

What you may want to try is;

From your Virtual Center, click on your cluster to bring up the HA and DRS settings. Uncheck both HA and DRS Settings. Click OK. Now go back into your settings and check both boxes and click OK. This should fix your problem. If it dosen't then, do the same but remove the cluster and build a new one. Thats the only way I have figured how to fix that problem. I had it once.

Hope that helped. Smiley Happy

Hope that helped. 🙂
Dewy
Contributor
Contributor

The Following should help resolve this:

From each of your esx hosts check.

1) cat /etc/hosts

Make sure no CAPITAL letters are in the hosts file. This should also include the short name

2) cat /etc/sysconfig/network

Again make sure FQN is listed and no capital's

If Caps or incorrect host name enter

hostname fullhost.name.com

3) Check for consistancy with

  • hostname -s

  • hostname -i

  • hostname

Also do a Ping -a from the VC server and make sure the hostname resolves correctly.

Once this has been checked and all Caps Corrected, Uncheck HA at the Cluster, Let HA un-configure and then re-enable HA again.

Cheers!

0 Kudos
gotts
Enthusiast
Enthusiast

I am still having a problem after I have tried all the rcommendations thus far. I do believe this is hostname related but I am not sure where to go. See below :

# hostname -s

hostname: Unknown host

# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com

GATEWAY=10.19.49.1

GATEWAYDEV=vswif0

#

My actual hostname fqdn is rsfpesx2.es.ad.adp.com.

The contents of my hosts file on all my esx servers is:

login as: root

root@rsfpesx2's password:

Last login: Fri Oct 26 09:02:46 2007 from rslvxpwsmuddk.nj.adp.com

# hostname -s

hostname: Unknown host

# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com

GATEWAY=10.19.49.1

GATEWAYDEV=vswif0

# cd etc

-bash: cd: etc: No such file or directory

# cd /etc

# ls -la

#

# vi hosts

  1. Do not remove the following line, or various programs

  2. that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

10.24.18.52 rslvesx1.es.ad.adp.com rslvesx1

10.24.18.228 rslvesx2.es.ad.adp.com rslvesx2

10.19.49.192 rsfpesx1.es.ad.adp.com rsfpesx1

10.19.49.197 rsfpesx2.es.ad.adp.com rsfpesx2

I can ping all of these by shortname or fqdn including VC server.

I have removed both hosts from the cluster and recreated the cluster.

I am sure this is something simple but I am still overlooking it apparently!!!

0 Kudos
mstahl75
Virtuoso
Virtuoso

HOSTNAME=rsfpesx2.rsfpesx2.es.ad.adp.com

That looks like it could be part of your problem if not all of it.

View solution in original post

0 Kudos
gotts
Enthusiast
Enthusiast

I agree but I am not sure where this is coming from???

0 Kudos
WillemB
Enthusiast
Enthusiast

Try flushing the FT_HOSTS file it has the HA partners DNS located in it.

Look in these files on all hosts and check for problems.

Try rebuilding the cluster 1 host at a time and see if you get the problem with a specific host.

0 Kudos
gotts
Enthusiast
Enthusiast

I am sorry to be so ignorant but how would I go about flushing FT_HOSTS?

0 Kudos
WillemB
Enthusiast
Enthusiast

Make a backup of the file and then remove all entries from it.

FT_HOSTS will be rebuilt when setting up HA. FT_HOSTS contains the host names of all HA partners if i'm not mistaken.

Google around for info on FT_HOSTS.

0 Kudos
j_d_vmware
Enthusiast
Enthusiast

I had the same issues. Ended up that virtual center wasn't in DNS. Added virtual center and reconfigured each host and everything has been fine.

You might be able to find some things here...

cat /opt/LGTOaam512/log/aam_config_util_addnode.log |more

cat /var/log/vmware/vpx/vpxa.log |more

/usr/bin/perl /opt/LGTOaam512/vmware/aam_config_util.pl -z -cmd=listnodes -domain=vmware

James Dougherty
0 Kudos
Dewy
Contributor
Contributor

Please do this in this EXACT ORDER

edit network file to correct your hostname

  1. nano /etc/sysconfig/network

Set host name with the True fqn name of your host

  1. hostname full.hostname.com

Check for consistency with

  1. hostname -i

  2. hostname -s

  3. hostname

Now remove the ESX host from VC and re-add the host back again by this CORRECT fqn name. If nessesary edit your VC windows host file

0 Kudos