VMware Cloud Community
sfont3n
Enthusiast
Enthusiast

HA issue

Cluster consist of 4 host. two of the host get An error occured during configuration of the HA agent on the host. Both host can ping the other host in the cluster by FQDN and short name

0 Kudos
14 Replies
IB_IT
Expert
Expert

what is the error you are recieving? How many hosts do you have set to allow for failure in HA?

0 Kudos
sfont3n
Enthusiast
Enthusiast

An error occured during configuration of the HA agent on the host

1

0 Kudos
mike_laspina
Champion
Champion

Hello,

What do you have in the hosts aam log

do a cat of /var/log/vmware/aam/aam_config_util_addnode.log

http://blog.laspina.ca/ vExpert 2009
0 Kudos
IB_IT
Expert
Expert

what happens if you uncheck HA, click ok...then go back and check HA, click ok? This should reset HA on all hosts in the cluster.

0 Kudos
bretti
Expert
Expert

HA Seems to have problems if it does not find exactly what it likes. There are a few basic troubleshooting things you can do.

1) On the host with the error, try to run the "Reconfigure For VMware HA" task.

2) Disabled HA on the entire cluster, wait a few hours, then re-enable HA on the entire cluster.

0 Kudos
sfont3n
Enthusiast
Enthusiast

:: cmd status was 0

CMD: Wed Feb 13 16:26:22 2008 /opt/vmware/aam/bin/ft_gethostbyname esx-brmc01 |grep FAILED

RESULT:

-


:: cmd status was 1

add_aam_node

CMD: Wed Feb 13 16:26:22 2008 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run

RESULT:

-


main::write_run_script:1101: cmd status was 0

add_aam_node: this is the primary agent -- 1st node in cluster.

add_aam_node: primary agent: esx-brmc01

CMD: Wed Feb 13 16:26:22 2008 cp /opt/vmware/aam/ha/vmware_first_node.pl /opt/vmware/aam/bin/runInit

RESULT:

-


main::write_run_script:1114: cmd status was 0

CMD: Wed Feb 13 16:26:22 2008 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=esx-brmc01 -port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=esx-brmc01

RESULT:

-


Legato Automated Availability Manager setup script.

Setting environment from /opt/vmware/aam/bin/agent_env.Linux

Setting up the Legato Automated Availability Manager agent for domain vmware

Welcome to VMware HA Agent. (Release 5.1 )

Configuring Agent for current node: esx-brmc01

Enter the name of your domain :

Using comand line argument domain of : vmware

Configuration requires the node name of a primary agent. If you

are configuring the first node in the domain, enter the name

of this node. (i.e. esx-brmc01) If this is a subsequent installation

enter the name of an existing primary agent node.

Enter the name of a Primary Agent Node:

Using input argument of esx-brmc01 for Primary Agent

Performing a primary node configuration.

Agents require the use of 4 network ports through which to

communicate. These port numbers must be available and consistent

across each of the nodes in the domain. If you are unsure about

specifying port numbers or defining primary nodes please read the

appropriate sections of the user documentation provided with this

product.

Specify the first of the 4 port numbers:

Using argument for port1: 8042

Ports 8042, 8043, 8044 and 8045 will be used.

Enter the name of your SMTP mail server (optional):

Installation for this node is complete.

To start the Agent run the "ft_startup" command.

main::add_aam_node:212: cmd status was 0

CMD: Wed Feb 13 16:26:22 2008 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck

RESULT:

-


main::edit_ftbb_prm_file:1280: cmd status was 0

ft_startup_monitor: waiting for /opt/vmware/aam/bin/ft_startup to complete

CMD: Wed Feb 13 16:26:40 2008 /opt/vmware/aam/bin/ft_startup

RESULT:

-


Legato Automated Availability Manager startup script.

Setting environment from /opt/vmware/aam/bin/agent_env.Linux

Starting agent for domain vmware

vmware

Starting Backbone...

..

Backbone started successfully.

Starting Agent...

Agent started successfully.

main::ft_startup_thread:1285: cmd status was 0

ft_startup_monitor: elapsed time 0 minute(s) and 18 second(s)

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Wed Feb 13 16:26:44 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


esx-brmc01 Primary Agent Running

main::issue_cli_cmd:1429: cmd status was 0

wait_agent_startup: waiting for agent 'esx-brmc01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 1 second(s)

CMD: Wed Feb 13 16:26:44 2008 /bin/ping -c 1 147.206.235.1

RESULT:

-


PING 147.206.235.1 (147.206.235.1) 56(84) bytes of data.

64 bytes from 147.206.235.1: icmp_seq=0 ttl=255 time=0.730 ms

--- 147.206.235.1 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.730/0.730/0.730/0.000 ms, pipe 2

main::configure_ips:1349: cmd status was 0

active_primary_ftcli: active primary is 'esx-brmc01'

active_primary_ftcli: command is 'import /var/log/vmware/aam/aam_config_util.def -skipfail=false'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"'

CMD: Wed Feb 13 16:26:44 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"

RESULT:

-


Node Fd Settings "esx-brmc01" Modified

OK

main::issue_cli_cmd:2268: cmd status was 0

active_primary_ftcli: command ran successfully on 'esx-brmc01'.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"'

CMD: Wed Feb 13 16:26:45 2008 /opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"

RESULT:

-


-


-


-


VMWareClusterManager enabled esx-brmc01

main::issue_cli_cmd:1004: cmd status was 0

CMD: Wed Feb 13 16:26:45 2008 /bin/echo " -


-


-


VMWareClusterManager enabled esx-brmc01

" | grep -i "VMWareClusterManager" | grep -i "enable" >> /dev/null

RESULT:

-


main::configure_ips:1349: cmd status was 0

active_primary_ftcli: active primary is 'esx-brmc01'

active_primary_ftcli: command is 'fireOnDemandTrigger SetP2P on 0 ADD#esx-brmc01'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#esx-brmc01"'

CMD: Wed Feb 13 16:26:45 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#esx-brmc01"

RESULT:

-


OK

main::issue_cli_cmd:2268: cmd status was 0

active_primary_ftcli: command ran successfully on 'esx-brmc01'.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Wed Feb 13 16:26:46 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


esx-brmc01 Primary Agent Running

main::issue_cli_cmd:1429: cmd status was 0

wait_agent_startup: waiting for agent 'esx-brmc01' to come alive, status is : 'running'

CMD: Wed Feb 13 16:26:56 2008 ps -ef | grep -v grep | /bin/egrep -i "ftAgent|ft_startup"

RESULT:

-


root 9569 1 0 16:26 ? 00:00:00 /opt/vmware/aam/bin/ftAgent -d vmware

main::wait_agent_startup:1352: cmd status was 0

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Wed Feb 13 16:26:56 2008 /opt/vmware/aam/bin/ftcli -domain vmware -connect esx-brmc01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


esx-brmc01 Primary Agent Running

main::issue_cli_cmd:1429: cmd status was 0

wait_agent_startup: waiting for agent 'esx-brmc01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 11 second(s)

CMD: Wed Feb 13 16:26:56 2008 cat /etc/init.d/VMWAREAAM51_vmware | sed '/FT_DOMAIN=/i\ FT_NO_CONSOLE_TRACE=1; export FT_NO_CONSOLE_TRACE; FT_ISOLATION_TIME=1; export FT_ISOLATION_TIME; \' > /opt/vmware/aam/ha/output

RESULT:

-


main::edit_startup_script:213: cmd status was 0

CMD: Wed Feb 13 16:26:56 2008 mv /opt/vmware/aam/ha/output /etc/init.d/VMWAREAAM51_vmware

RESULT:

-


main::edit_startup_script:213: cmd status was 0

CMD: Wed Feb 13 16:26:56 2008 chmod 755 /etc/init.d/VMWAREAAM51_vmware

RESULT:

-


main::edit_startup_script:213: cmd status was 0

Total time for script to complete: 0 minute(s) and 34 second(s)

#

0 Kudos
sfont3n
Enthusiast
Enthusiast

same issue IB_IT

0 Kudos
IB_IT
Expert
Expert

What version/build is your VC server and ESX hosts?

0 Kudos
sfont3n
Enthusiast
Enthusiast

3.5.0 build 70356

0 Kudos
IB_IT
Expert
Expert

hmm...yeah that's a wierd one...I have had my share of HA problems and usually disabling/enabling HA at the cluster level fixes it. Have you seen this?

http://communities.vmware.com/message/524023

an interesting post on case sensitivity in DNS and also WINS related issues (in case your infrastructure still uses this)

0 Kudos
mike_laspina
Champion
Champion

Your primary agent is successfully configured, The error is not an install issue but more likely an invalid policy for HA.

For example you will need to create SC network redundancy or you will have errors in the agent configuration.

http://blog.laspina.ca/ vExpert 2009
0 Kudos
admin
Immortal
Immortal

Click on the "Tasks & Events" tabs for the host and check if there are any related events for the failed configure HA task.

0 Kudos
sfont3n
Enthusiast
Enthusiast

mike

so create a second service console?

all my other blades are set the same with no issue

0 Kudos
mike_laspina
Champion
Champion

It will work with or without the redundant SC management network. Provided that the single path does not get disconnected or fail some how.

It is a better to have a team across two physical paths, but this is not always an option for some installs.

Just to clarify that you are not seeing some other issue, what I am discussing is specifically as follows.

"Host <name> currently has no management network redundancy"

Which is a warning state more than an error.

http://blog.laspina.ca/ vExpert 2009
0 Kudos