VMware Cloud Community
sergeda
Contributor
Contributor

Can't enable HA on host "ft_gethostbyname |grep FAILED"

Hi all.

I have 2 esx 3.0.1 servers which working in cluster and have successfully been configured for HA. Now I'm trying to add third server and configure it for HA. But I always got error "HA agent has an error" and "internalerror: Vmap_esx1 process failed to stop" on configuring it for HA. I have added all needed strings to hosts files, check if /opt/LGTOaam512/bin/ft_gethostbyname command return the right results on all servers. Everything seems to be OK. The only difference I found the result of command opt/LGTOaam512/bin/ft_gethostbyname was returned doubled on already working servers. Then I check /opt/LGTOaam512/log/aam_config_util_addnode.log on failing server and found there this string:

CMD: /opt/LGTOaam512/bin/ft_gethostbyname esx1 |grep FAILED

The same command without grep part successfully returns right ip of the server if I run it from console. I'm new to Linux so I don't know what grep part did. Help me please to figure out what's wrong.

Reply
0 Kudos
5 Replies
enDemand
Enthusiast
Enthusiast

"grep" is command that is used to match a pattern defined by regular expressions. In your case below "grep FAILED" will only display lines as output where "FAILED" is found in those lines.

I haven't seen the ft_gethostbyname returned doubled before, but if it returns the IP and name of the server without "FAILED" in the line, then that simply means that the host's EMC AAM agent (HA Agent) is working OK.

Can you reply and post the output of the following two commands?:

ps -ef | grep /opt/LGTO

and

cat /opt/LGTOaam512/log/aam_config_util_listnodes.log

If you find this or any other answer useful, please consider awarding points by marking the answer "correct" or "helpful".
Reply
0 Kudos
sergeda
Contributor
Contributor

Yes, ft_gethostbyname returned doubled but without "FAILED" on already configured for HA servers.

Here is result of your commands:

\[root@esx1 etc]# ps -ef | grep /opt/LGTO

root 6233 1408 0 16:56 pts/0 00:00:00 grep /opt/LGTO

\[root@esx1 etc]# cat /opt/LGTOaam512/log/aam_config_util_listnodes.log

KEY: -z VAL: 1

KEY: domain VAL: vmware

KEY: cmd VAL: listnodes

CMD: hostname -s

RESULT:

\----


esx1

CMD: /opt/LGTOaam512/bin/ft_gethostbyname esx1 |grep FAILED

RESULT:

\----


list_nodes

CMD: /opt/LGTOaam512/bin/ftcli -domain vmware -connect esx3 -port 8042 -timeout

60 -cmd listnodes

RESULT:

\----


\[Err:10035] User Not Found

\[Err:8001] Access Denied

Copying /opt/LGTOaam512/config/vmware-sites to /opt/LGTOaam512/log/aam_config_ut

il_listnodes.log

FULLTIME_SITES_TID 00000001

+ 1:8042,8042,8043 esx3 vmware #FT_Agent_Port=8045

Total time for script to complete: 0 minute(s) and 0 second(s)

Reply
0 Kudos
sergeda
Contributor
Contributor

After disabling HA on cluster and enabling again problem was gone.

Reply
0 Kudos
jaygriffin
Enthusiast
Enthusiast

Has it come back?

If I disable and re-enable, a different host has the same problem.

Reply
0 Kudos
jaygriffin
Enthusiast
Enthusiast

I renamed the /etc/opt/vmware/aam/FT_HOSTS to .old and I did the "Reconfigure for VMware HA" and it worked.

Reply
0 Kudos