Hi all.
I have 2 esx 3.0.1 servers which working in cluster and have successfully been configured for HA. Now I'm trying to add third server and configure it for HA. But I always got error "HA agent has an error" and "internalerror: Vmap_esx1 process failed to stop" on configuring it for HA. I have added all needed strings to hosts files, check if /opt/LGTOaam512/bin/ft_gethostbyname command return the right results on all servers. Everything seems to be OK. The only difference I found the result of command opt/LGTOaam512/bin/ft_gethostbyname was returned doubled on already working servers. Then I check /opt/LGTOaam512/log/aam_config_util_addnode.log on failing server and found there this string:
CMD: /opt/LGTOaam512/bin/ft_gethostbyname esx1 |grep FAILED
The same command without grep part successfully returns right ip of the server if I run it from console. I'm new to Linux so I don't know what grep part did. Help me please to figure out what's wrong.
"grep" is command that is used to match a pattern defined by regular expressions. In your case below "grep FAILED" will only display lines as output where "FAILED" is found in those lines.
I haven't seen the ft_gethostbyname returned doubled before, but if it returns the IP and name of the server without "FAILED" in the line, then that simply means that the host's EMC AAM agent (HA Agent) is working OK.
Can you reply and post the output of the following two commands?:
ps -ef | grep /opt/LGTO
and
cat /opt/LGTOaam512/log/aam_config_util_listnodes.log
Yes, ft_gethostbyname returned doubled but without "FAILED" on already configured for HA servers.
Here is result of your commands:
\[root@esx1 etc]# ps -ef | grep /opt/LGTO
root 6233 1408 0 16:56 pts/0 00:00:00 grep /opt/LGTO
\[root@esx1 etc]# cat /opt/LGTOaam512/log/aam_config_util_listnodes.log
KEY: -z VAL: 1
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
CMD: hostname -s
RESULT:
\----
esx1
CMD: /opt/LGTOaam512/bin/ft_gethostbyname esx1 |grep FAILED
RESULT:
\----
list_nodes
CMD: /opt/LGTOaam512/bin/ftcli -domain vmware -connect esx3 -port 8042 -timeout
60 -cmd listnodes
RESULT:
\----
\[Err:10035] User Not Found
\[Err:8001] Access Denied
Copying /opt/LGTOaam512/config/vmware-sites to /opt/LGTOaam512/log/aam_config_ut
il_listnodes.log
FULLTIME_SITES_TID 00000001
+ 1:8042,8042,8043 esx3 vmware #FT_Agent_Port=8045
Total time for script to complete: 0 minute(s) and 0 second(s)
After disabling HA on cluster and enabling again problem was gone.
Has it come back?
If I disable and re-enable, a different host has the same problem.
I renamed the /etc/opt/vmware/aam/FT_HOSTS to .old and I did the "Reconfigure for VMware HA" and it worked.