Cluster consist of 4 host. two of the host get An error occured during configuration of the HA agent on the host. Both host can ping the other host in the cluster by FQDN and short name
what is the error you are recieving? How many hosts do you have set to allow for failure in HA?
An error occured during configuration of the HA agent on the host
1
Hello,
What do you have in the hosts aam log
do a cat of /var/log/vmware/aam/aam_config_util_addnode.log
what happens if you uncheck HA, click ok...then go back and check HA, click ok? This should reset HA on all hosts in the cluster.
HA Seems to have problems if it does not find exactly what it likes. There are a few basic troubleshooting things you can do.
1) On the host with the error, try to run the "Reconfigure For VMware HA" task.
2) Disabled HA on the entire cluster, wait a few hours, then re-enable HA on the entire cluster.
:: cmd status was 0
CMD: Wed Feb 13 16:26:22 2008 /opt/vmware/aam/bin/ft_gethostbyname esx-brmc01 |grep FAILED
RESULT:
-
:: cmd status was 1
add_aam_node
CMD: Wed Feb 13 16:26:22 2008 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run
RESULT:
-
main::write_run_script:1101: cmd status was 0
add_aam_node: this is the primary agent -- 1st node in cluster.
add_aam_node: primary agent: esx-brmc01
CMD: Wed Feb 13 16:26:22 2008 cp /opt/vmware/aam/ha/vmware_first_node.pl /opt/vmware/aam/bin/runInit
RESULT:
-
main::write_run_script:1114: cmd status was 0
CMD: Wed Feb 13 16:26:22 2008 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=esx-brmc01 -port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=esx-brmc01
RESULT:
-
Legato Automated Availability Manager setup script.
Setting environment from /opt/vmware/aam/bin/agent_env.Linux
Setting up the Legato Automated Availability Manager agent for domain vmware
Welcome to VMware HA Agent. (Release 5.1 )
Configuring Agent for current node: esx-brmc01
Enter the name of your domain :
Using comand line argument domain of : vmware
Configuration requires the node name of a primary agent. If you
are configuring the first node in the domain, enter the name
of this node. (i.e. esx-brmc01) If this is a subsequent installation
enter the name of an existing primary agent node.
Enter the name of a Primary Agent Node:
Using input argument of esx-brmc01 for Primary Agent
Performing a primary node configuration.
Agents require the use of 4 network ports through which to
communicate. These port numbers must be available and consistent
across each of the nodes in the domain. If you are unsure about
specifying port numbers or defining primary nodes please read the
appropriate sections of the user documentation provided with this
product.
Specify the first of the 4 port numbers:
Using argument for port1: 8042
Ports 8042, 8043, 8044 and 8045 will be used.
Enter the name of your SMTP mail server (optional):
Installation for this node is complete.
To start the Agent run the "ft_startup" command.
main::add_aam_node:212: cmd status was 0
CMD: Wed Feb 13 16:26:22 2008 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck
RESULT:
-
main::edit_ftbb_prm_file:1280: cmd status was 0
ft_startup_monitor: waiting for /opt/vmware/aam/bin/ft_startup to complete
CMD: Wed Feb 13 16:26:40 2008 /opt/vmware/aam/bin/ft_startup
RESULT:
-
Legato Automated Availability Manager startup script.
Setting environment from /opt/vmware/aam/bin/agent_env.Linux
Starting agent for domain vmware
vmware
Starting Backbone...
..
Backbone started successfully.
Starting Agent...
Agent started successfully.
main::ft_startup_thread:1285: cmd status was 0
ft_startup_monitor: elapsed time 0 minute(s) and 18 second(s)
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Wed Feb 13 16:26:44 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
esx-brmc01 Primary Agent Running
main::issue_cli_cmd:1429: cmd status was 0
wait_agent_startup: waiting for agent 'esx-brmc01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 1 second(s)
CMD: Wed Feb 13 16:26:44 2008 /bin/ping -c 1 147.206.235.1
RESULT:
-
PING 147.206.235.1 (147.206.235.1) 56(84) bytes of data.
64 bytes from 147.206.235.1: icmp_seq=0 ttl=255 time=0.730 ms
--- 147.206.235.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.730/0.730/0.730/0.000 ms, pipe 2
main::configure_ips:1349: cmd status was 0
active_primary_ftcli: active primary is 'esx-brmc01'
active_primary_ftcli: command is 'import /var/log/vmware/aam/aam_config_util.def -skipfail=false'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"'
CMD: Wed Feb 13 16:26:44 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"
RESULT:
-
Node Fd Settings "esx-brmc01" Modified
OK
main::issue_cli_cmd:2268: cmd status was 0
active_primary_ftcli: command ran successfully on 'esx-brmc01'.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"'
CMD: Wed Feb 13 16:26:45 2008 /opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"
RESULT:
-
-
-
-
VMWareClusterManager enabled esx-brmc01
main::issue_cli_cmd:1004: cmd status was 0
CMD: Wed Feb 13 16:26:45 2008 /bin/echo " -
-
-
VMWareClusterManager enabled esx-brmc01
" | grep -i "VMWareClusterManager" | grep -i "enable" >> /dev/null
RESULT:
-
main::configure_ips:1349: cmd status was 0
active_primary_ftcli: active primary is 'esx-brmc01'
active_primary_ftcli: command is 'fireOnDemandTrigger SetP2P on 0 ADD#esx-brmc01'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#esx-brmc01"'
CMD: Wed Feb 13 16:26:45 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#esx-brmc01"
RESULT:
-
OK
main::issue_cli_cmd:2268: cmd status was 0
active_primary_ftcli: command ran successfully on 'esx-brmc01'.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Wed Feb 13 16:26:46 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
esx-brmc01 Primary Agent Running
main::issue_cli_cmd:1429: cmd status was 0
wait_agent_startup: waiting for agent 'esx-brmc01' to come alive, status is : 'running'
CMD: Wed Feb 13 16:26:56 2008 ps -ef | grep -v grep | /bin/egrep -i "ftAgent|ft_startup"
RESULT:
-
root 9569 1 0 16:26 ? 00:00:00 /opt/vmware/aam/bin/ftAgent -d vmware
main::wait_agent_startup:1352: cmd status was 0
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Wed Feb 13 16:26:56 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
esx-brmc01 Primary Agent Running
main::issue_cli_cmd:1429: cmd status was 0
wait_agent_startup: waiting for agent 'esx-brmc01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 11 second(s)
CMD: Wed Feb 13 16:26:56 2008 cat /etc/init.d/VMWAREAAM51_vmware | sed '/FT_DOMAIN=/i\ FT_NO_CONSOLE_TRACE=1; export FT_NO_CONSOLE_TRACE; FT_ISOLATION_TIME=1; export FT_ISOLATION_TIME; \' > /opt/vmware/aam/ha/output
RESULT:
-
main::edit_startup_script:213: cmd status was 0
CMD: Wed Feb 13 16:26:56 2008 mv /opt/vmware/aam/ha/output /etc/init.d/VMWAREAAM51_vmware
RESULT:
-
main::edit_startup_script:213: cmd status was 0
CMD: Wed Feb 13 16:26:56 2008 chmod 755 /etc/init.d/VMWAREAAM51_vmware
RESULT:
-
main::edit_startup_script:213: cmd status was 0
Total time for script to complete: 0 minute(s) and 34 second(s)
same issue IB_IT
What version/build is your VC server and ESX hosts?
3.5.0 build 70356
hmm...yeah that's a wierd one...I have had my share of HA problems and usually disabling/enabling HA at the cluster level fixes it. Have you seen this?
http://communities.vmware.com/message/524023
an interesting post on case sensitivity in DNS and also WINS related issues (in case your infrastructure still uses this)
Your primary agent is successfully configured, The error is not an install issue but more likely an invalid policy for HA.
For example you will need to create SC network redundancy or you will have errors in the agent configuration.
Click on the "Tasks & Events" tabs for the host and check if there are any related events for the failed configure HA task.
mike
so create a second service console?
all my other blades are set the same with no issue
It will work with or without the redundant SC management network. Provided that the single path does not get disconnected or fail some how.
It is a better to have a team across two physical paths, but this is not always an option for some installs.
Just to clarify that you are not seeing some other issue, what I am discussing is specifically as follows.
"Host <name> currently has no management network redundancy"
Which is a warning state more than an error.