dylanebner
Contributor
Contributor

ESX 3.5 U4 HA not working

I have two ESX servers that I cannot get HA working on. We have just upgraded from ESX 3.5 U1 thinking this might help, but it hasn't. We have always had trouble getting HA running, even before the upgrade. I have looked at the DNS, and everything looks ok. I also added the host names to each servers host file.

When we enable HA, the primary server completes and then the second server sits at 60% for a while and then moves up to 83% and then 9x% and then fails. Shortly after the second server fails, the primary also reports and error.

Here are some logs:

ADDNODE log from second server:

# cat aam_config_util_addnode.log

KEY: shortname VAL: VRC-E31-25-D1955-B01-VI3-VM02

KEY: cmd VAL: addnode

KEY: primarynet VAL: 10.5.5.130/255.255.255.0:

KEY: hostnet VAL: 10.5.5.131/255.255.255.0

KEY: domain VAL: vmware

KEY: primaryagent VAL: vrc-e30-17-d1955-b09-vi3-vm01

KEY: iso VAL: 10.5.5.1

KEY: -z VAL: 1

add_aam_node

CMD: Thu Apr 9 12:35:06 2009 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run

RESULT:

-


main::write_run_script:1129: cmd status was 0

CMD: Thu Apr 9 12:35:06 2009 cp /opt/vmware/aam/bin/generateConfigBackup.pl /opt/vmware/aam/bin/runOnce

RESULT:

-


main::write_run_script:1133: cmd status was 0

setup_ft_hosts adding host vrc-e30-17-d1955-b09-vi3-vm01 IP info: 10.5.5.130/255.255.255.0:CMD: Thu Apr 9 12:35:06 2009 hostname -s

RESULT:

-


VRC-E31-25-D1955-B01-VI3-VM02

main::verify_network_configuration:1154: cmd status was 0

CMD: Thu Apr 9 12:35:06 2009 /opt/vmware/aam/bin/ft_gethostbyname VRC-E31-25-D1955-B01-VI3-VM02 |grep FAILED

RESULT:

-


main::verify_network_configuration:1154: cmd status was 1

CMD: Thu Apr 9 12:35:06 2009 /opt/vmware/aam/bin/ft_gethostbyname vrc-e30-17-d1955-b09-vi3-vm01 |grep FAILED

RESULT:

-


main::verify_network_configuration:1154: cmd status was 1

CMD: Thu Apr 9 12:35:06 2009 /usr/sbin/esxcfg-vswif -l

RESULT:

-


Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.5.5.131 255.255.255.0 10.5.5.255 true false

main::verify_network_configuration:1154: cmd status was 0

CMD: Thu Apr 9 12:35:06 2009 cp /opt/vmware/aam/ha/vmware_subsequent_node.pl /opt/vmware/aam/bin/runOnce

RESULT:

-


main::write_run_script:1155: cmd status was 0

ports 8042-8045 are free for use.

CMD: Thu Apr 9 12:35:06 2009 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=vrc-e31-25-d1955-b01-vi3-vm02 -port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=vrc-e30-17-d1955-b09-vi3-vm01

RESULT:

-


AAM setup script.

Setting environment from /opt/vmware/aam/config/agent_env.Linux

Setting up the AAM agent for domain vmware

Welcome to VMware HA Agent. (Release 5.1 )

Configuring Agent for current node: vrc-e31-25-d1955-b01-vi3-vm02

Configuration requires the node name of a primary agent. If you

are configuring the first node in the domain, enter the name

of this node. (i.e. vrc-e31-25-d1955-b01-vi3-vm02) If this is a subsequent installation

enter the name of an existing primary agent node.

Enter the name of a Primary Agent Node :

Using input argument of vrc-e30-17-d1955-b09-vi3-vm01 for Primary Agent

Agents require the use of 4 network ports through which to

communicate. These port numbers must be available and consistent

across each of the nodes in the domain. If you are unsure about

specifying port numbers or defining primary nodes please read the

appropriate sections of the user documentation provided with this

product.

Specify the first of the 4 port numbers:

Using argument for port1: 8042

Ports 8042, 8043, 8044 and 8045 will be used.

Installation for this node is complete.

To start the Agent run the "ft_startup" command.

main::add_aam_node:145: cmd status was 0

CMD: Thu Apr 9 12:35:06 2009 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck

RESULT:

-


main::edit_ftbb_prm_file:1252: cmd status was 0

CMD: Thu Apr 9 12:35:06 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out

RESULT:

-


main::ft_startup_monitor:1256: cmd status was 0

Waiting for /opt/vmware/aam/bin/ft_startup to complete

ft_startup_monitor: elapsed time 0 minute(s) and 3 second(s)

CMD: Thu Apr 9 12:35:12 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out

RESULT:

-


main::ft_startup_monitor:1269: cmd status was 0

Waiting for /opt/vmware/aam/bin/ft_startup to complete

ft_startup_monitor: elapsed time 0 minute(s) and 19 second(s)

active_primary_ftcli: active primary is ''

active_primary_ftcli: command is 'stop VMap_vrc-e31-25-d1955-b01-vi3-vm02'

find_active_primary: attempting to find an active primary.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -cmd "la -l"'

CMD: Thu Apr 9 12:35:34 2009 /opt/vmware/aam/bin/ftcli -domain vmware -cmd "la -l"

RESULT:

-


Node Agent Process Mon Rule Int

-


-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 (1/16:6001.0) (1/16:6046.0) (1/16:6056.0)

Node vrc-e30-17-d1955-b09-vi3-vm01 (1/16:6056.0) is the active interpreter

main::issue_cli_cmd:2144: cmd status was 0

find_active_primary: active primary has been found: 'vrc-e30-17-d1955-b09-vi3-vm01'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "stop VMap_vrc-e31-25-d1955-b01-vi3-vm02"'

CMD: Thu Apr 9 12:35:34 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "stop VMap_vrc-e31-25-d1955-b01-vi3-vm02"

RESULT:

-


Error : Process Not Found

main::issue_cli_cmd:2241: cmd status was 1

active_primary_ftcli: command did not run successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'

active_primary_ftcli: command is 'forceProcState VMap_vrc-e31-25-d1955-b01-vi3-vm02 stopped'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "forceProcState VMap_vrc-e31-25-d1955-b01-vi3-vm02 stopped"'

CMD: Thu Apr 9 12:35:35 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "forceProcState VMap_vrc-e31-25-d1955-b01-vi3-vm02 stopped"

RESULT:

-


Error : Process Not Found

main::issue_cli_cmd:2241: cmd status was 1

active_primary_ftcli: command did not run successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'

active_primary_ftcli: command is 'deleteproc VMap_vrc-e31-25-d1955-b01-vi3-vm02'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "deleteproc VMap_vrc-e31-25-d1955-b01-vi3-vm02"'

CMD: Thu Apr 9 12:35:36 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "deleteproc VMap_vrc-e31-25-d1955-b01-vi3-vm02"

RESULT:

-


Error : Process Not Found

main::issue_cli_cmd:2241: cmd status was 1

active_primary_ftcli: command did not run successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

CMD: Thu Apr 9 12:35:38 2009 /opt/vmware/aam/bin/ft_shutdown -b -ppid=5611

RESULT:

-


AAM setup script.

Setting environment from /opt/vmware/aam/config/agent_env.Linux

Shutting down the AAM agent for domain vmware

Shutting down the Agent...

Shutting down the Backbone...

Skipping /usr/bin/perl (5611) because it is a parent script.

Skipping /opt/vmware/aam/bin/ftPerl (5835) because it's this script.

Killing PIDs:

After kills issued:

PID: PID PROC: CMD

PID: 1 PROC: init

PID: 2 PROC:

PID: 3 PROC:

PID: 6 PROC:

PID: 4 PROC:

PID: 5 PROC:

PID: 7 PROC:

PID: 19 PROC:

PID: 20 PROC:

PID: 24 PROC:

PID: 25 PROC:

PID: 487 PROC:

PID: 488 PROC:

PID: 514 PROC:

PID: 542 PROC:

PID: 593 PROC:

PID: 752 PROC:

PID: 753 PROC:

PID: 1309 PROC: syslogd

PID: 1313 PROC: klogd

PID: 1336 PROC:

PID: 1470 PROC: /opt/Navisphere/bin/naviagent

PID: 1619 PROC: /usr/sbin/snmpd

PID: 1628 PROC: /usr/sbin/sshd

PID: 1682 PROC: /usr/sbin/vmklogger

PID: 1729 PROC: xinetd

PID: 1744 PROC: ntpd

PID: 1753 PROC: gpm

PID: 1776 PROC: /bin/sh

PID: 1783 PROC: /usr/lib/vmware/webAccess/java/jre1.5.0_16/bin/webAccess

PID: 1792 PROC: crond

PID: 1808 PROC: /opt/dell/srvadmin/oma/bin/dsm_om_shrsvc32d

PID: 2462 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d

PID: 2667 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d

PID: 2688 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_eventmgr32d

PID: 2699 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d

PID: 2728 PROC: /opt/dell/srvadmin/iws/bin/linux/dsm_om_connsvc32d

PID: 2729 PROC: /opt/dell/srvadmin/iws/bin/linux/dsm_om_connsvc32d

PID: 2741 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 2761 PROC: /bin/sh

PID: 2765 PROC: logger

PID: 2785 PROC: /usr/lib/vmware/hostd/vmware-hostd

PID: 3406 PROC: /bin/sh

PID: 3412 PROC: /var/pegasus/bin/cimserver

PID: 3447 PROC: /bin/sh

PID: 3456 PROC: /opt/vmware/vpxa/vpx/vpxa

PID: 3484 PROC: /bin/sh

PID: 3490 PROC: /sbin/openwsmand

PID: 3494 PROC: /sbin/mingetty

PID: 3495 PROC: /sbin/mingetty

PID: 3496 PROC: /sbin/mingetty

PID: 3497 PROC: /sbin/mingetty

PID: 3498 PROC: /sbin/mingetty

PID: 3499 PROC: /sbin/mingetty

PID: 3662 PROC: /var/pegasus/bin/cimserver

PID: 3669 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3671 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3676 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3684 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3691 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3695 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3704 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3712 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3723 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3725 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 3730 PROC: /usr/lib/vmware/bin/vmkload_app

PID: 5033 PROC: sshd:

PID: 5035 PROC: sshd:

PID: 5036 PROC: -bash

PID: 5070 PROC: su

PID: 5071 PROC: -bash

PID: 5611 PROC: /usr/bin/perl

PID: 5835 PROC: /opt/vmware/aam/bin/ftPerl

PID: 5858 PROC: ps

main::stop_aam:2077: cmd status was 0

kill_aam: copying /etc/opt/vmware/aam/vmware-sites to /var/log/vmware/aam/aam_config_util_addnode.log

FULLTIME_SITES_TID 00000002

+ 1:8042,8042,8043 vrc-e30-17-d1955-b09-vi3-vm01 vmware #FT_Agent_Port=8045

+ 2:8042,8042,8043 vrc-e31-25-d1955-b01-vi3-vm02 vmware

myexit: copying /etc/opt/vmware/aam/vmware-sites to /var/log/vmware/aam/aam_config_util_addnode.log

FULLTIME_SITES_TID 00000002

+ 1:8042,8042,8043 vrc-e30-17-d1955-b09-vi3-vm01 vmware #FT_Agent_Port=8045

+ 2:8042,8042,8043 vrc-e31-25-d1955-b01-vi3-vm02 vmware

VMwareresult=failure

Total time for script to complete: 0 minute(s) and 32 second(s)

#

ADDNODE from first server:

# cat aam_config_util_addnode.log | more

KEY: shortname VAL: VRC-E30-17-D1955-B09-VI3-VM01

KEY: domain VAL: vmware

KEY: cmd VAL: addnode

KEY: iso VAL: 10.5.5.1

KEY: -z VAL: 1

KEY: hostnet VAL: 10.5.5.130/255.255.255.0

add_aam_node

CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run

RESULT:

-


main::write_run_script:1129: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/bin/generateConfigBackup.pl /opt/vmware/aam/bin/runOnce

RESULT:

-


main::write_run_script:1133: cmd status was 0

add_aam_node: this is the primary agent -- 1st node in cluster.

add_aam_node: primary agent: vrc-e30-17-d1955-b09-vi3-vm01

CMD: Thu Apr 9 12:34:25 2009 rm -f /etc/opt/vmware/aam/FT_HOSTS

RESULT:

-


main::add_aam_node:145: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/vmware_first_node.pl /opt/vmware/aam/bin/runOnce

RESULT:

-


main::write_run_script:1148: cmd status was 0

ports 8042-8045 are free for use.

CMD: Thu Apr 9 12:34:25 2009 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=vrc-e30-17-d1955-b09-vi3-vm01

-port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=vrc-e30-17-d1955-b09-vi3-vm01

RESULT:

-


AAM setup script.

Setting environment from /opt/vmware/aam/config/agent_env.Linux

Setting up the AAM agent for domain vmware

Welcome to VMware HA Agent. (Release 5.1 )

Configuring Agent for current node: vrc-e30-17-d1955-b09-vi3-vm01

Configuration requires the node name of a primary agent. If you

are configuring the first node in the domain, enter the name

of this node. (i.e. vrc-e30-17-d1955-b09-vi3-vm01) If this is a subsequent installation

enter the name of an existing primary agent node.

Enter the name of a Primary Agent Node :

Using input argument of vrc-e30-17-d1955-b09-vi3-vm01 for Primary Agent

Performing a primary node configuration.

Agents require the use of 4 network ports through which to

communicate. These port numbers must be available and consistent

across each of the nodes in the domain. If you are unsure about

specifying port numbers or defining primary nodes please read the

appropriate sections of the user documentation provided with this

product.

Specify the first of the 4 port numbers:

Using argument for port1: 8042

Ports 8042, 8043, 8044 and 8045 will be used.

Database from previous installation has been renamed.

Installation for this node is complete.

To start the Agent run the "ft_startup" command.

main::add_aam_node:145: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck

RESULT:

-


main::edit_ftbb_prm_file:1252: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out

RESULT:

-


main::ft_startup_monitor:1256: cmd status was 0

Waiting for /opt/vmware/aam/bin/ft_startup to complete

ft_startup_monitor: elapsed time 0 minute(s) and 18 second(s)

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "

listnodes"'

CMD: Thu Apr 9 12:34:46 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm

d "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 0 second(s)

CMD: Thu Apr 9 12:34:46 2009 /bin/ping -c 1 10.5.5.1

RESULT:

-


PING 10.5.5.1 (10.5.5.1) 56(84) bytes of data.

64 bytes from 10.5.5.1: icmp_seq=0 ttl=255 time=0.593 ms

--- 10.5.5.1 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.593/0.593/0.593/0.000 ms, pipe 2

main::configure_ips:1306: cmd status was 0

active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'

active_primary_ftcli: command is 'import /var/log/vmware/aam/aam_config_util.def -skipfail=false'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "

import /var/log/vmware/aam/aam_config_util.def -skipfail=false"'

CMD: Thu Apr 9 12:34:47 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm

d "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"

RESULT:

-


Node Fd Settings "vrc-e30-17-d1955-b09-vi3-vm01" Modified

OK

main::issue_cli_cmd:2241: cmd status was 0

active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "

listnodes"'

CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm

d "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 1 second(s)

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"'

CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"

RESULT:

-


-


-


-


RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01

VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01

main::issue_cli_cmd:992: cmd status was 0

CMD: Thu Apr 9 12:34:48 2009 /bin/echo " -


-


-


RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01

VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01

" | grep -i "VMWareClusterManager" | grep -i "enable" >> /dev/null

RESULT:

-


main::configure_ips:1306: cmd status was 0

active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'

active_primary_ftcli: command is 'fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "

fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"'

CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm

d "fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"

RESULT:

-


OK

main::issue_cli_cmd:2241: cmd status was 0

active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "

listnodes"'

CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm

d "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

CMD: Thu Apr 9 12:34:59 2009 ps -ef | grep -v grep | /bin/egrep -i "ftAgent|ft_startup"

RESULT:

-


root 6001 1 0 12:34 ? 00:00:00 /opt/vmware/aam/bin/ftAgent -d vmware

main::wait_agent_startup:1309: cmd status was 0

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "

listnodes"'

CMD: Thu Apr 9 12:34:59 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm

d "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 10 second(s)

VMwareresult=success

Total time for script to complete: 0 minute(s) and 34 second(s)

#

# cat aam_config_util_addnode.log

KEY: shortname VAL: VRC-E30-17-D1955-B09-VI3-VM01

KEY: domain VAL: vmware

KEY: cmd VAL: addnode

KEY: iso VAL: 10.5.5.1

KEY: -z VAL: 1

KEY: hostnet VAL: 10.5.5.130/255.255.255.0

add_aam_node

CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run

RESULT:

-


main::write_run_script:1129: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/bin/generateConfigBackup.pl /opt/vmware/aam/bin/runOnce

RESULT:

-


main::write_run_script:1133: cmd status was 0

add_aam_node: this is the primary agent -- 1st node in cluster.

add_aam_node: primary agent: vrc-e30-17-d1955-b09-vi3-vm01

CMD: Thu Apr 9 12:34:25 2009 rm -f /etc/opt/vmware/aam/FT_HOSTS

RESULT:

-


main::add_aam_node:145: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/vmware_first_node.pl /opt/vmware/aam/bin/runOnce

RESULT:

-


main::write_run_script:1148: cmd status was 0

ports 8042-8045 are free for use.

CMD: Thu Apr 9 12:34:25 2009 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=vrc-e30-17-d1955-b09-vi3-vm01 -port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=vrc-e30-17-d1955-b09-vi3-vm01

RESULT:

-


AAM setup script.

Setting environment from /opt/vmware/aam/config/agent_env.Linux

Setting up the AAM agent for domain vmware

Welcome to VMware HA Agent. (Release 5.1 )

Configuring Agent for current node: vrc-e30-17-d1955-b09-vi3-vm01

Configuration requires the node name of a primary agent. If you

are configuring the first node in the domain, enter the name

of this node. (i.e. vrc-e30-17-d1955-b09-vi3-vm01) If this is a subsequent installation

enter the name of an existing primary agent node.

Enter the name of a Primary Agent Node :

Using input argument of vrc-e30-17-d1955-b09-vi3-vm01 for Primary Agent

Performing a primary node configuration.

Agents require the use of 4 network ports through which to

communicate. These port numbers must be available and consistent

across each of the nodes in the domain. If you are unsure about

specifying port numbers or defining primary nodes please read the

appropriate sections of the user documentation provided with this

product.

Specify the first of the 4 port numbers:

Using argument for port1: 8042

Ports 8042, 8043, 8044 and 8045 will be used.

Database from previous installation has been renamed.

Installation for this node is complete.

To start the Agent run the "ft_startup" command.

main::add_aam_node:145: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck

RESULT:

-


main::edit_ftbb_prm_file:1252: cmd status was 0

CMD: Thu Apr 9 12:34:25 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out

RESULT:

-


main::ft_startup_monitor:1256: cmd status was 0

Waiting for /opt/vmware/aam/bin/ft_startup to complete

ft_startup_monitor: elapsed time 0 minute(s) and 18 second(s)

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Thu Apr 9 12:34:46 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 0 second(s)

CMD: Thu Apr 9 12:34:46 2009 /bin/ping -c 1 10.5.5.1

RESULT:

-


PING 10.5.5.1 (10.5.5.1) 56(84) bytes of data.

64 bytes from 10.5.5.1: icmp_seq=0 ttl=255 time=0.593 ms

--- 10.5.5.1 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.593/0.593/0.593/0.000 ms, pipe 2

main::configure_ips:1306: cmd status was 0

active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'

active_primary_ftcli: command is 'import /var/log/vmware/aam/aam_config_util.def -skipfail=false'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"'

CMD: Thu Apr 9 12:34:47 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"

RESULT:

-


Node Fd Settings "vrc-e30-17-d1955-b09-vi3-vm01" Modified

OK

main::issue_cli_cmd:2241: cmd status was 0

active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 1 second(s)

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"'

CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"

RESULT:

-


-


-


-


RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01

VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01

main::issue_cli_cmd:992: cmd status was 0

CMD: Thu Apr 9 12:34:48 2009 /bin/echo " -


-


-


RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01

VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01

" | grep -i "VMWareClusterManager" | grep -i "enable" >> /dev/null

RESULT:

-


main::configure_ips:1306: cmd status was 0

active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'

active_primary_ftcli: command is 'fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01'

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"'

CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"

RESULT:

-


OK

main::issue_cli_cmd:2241: cmd status was 0

active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

CMD: Thu Apr 9 12:34:59 2009 ps -ef | grep -v grep | /bin/egrep -i "ftAgent|ft_startup"

RESULT:

-


root 6001 1 0 12:34 ? 00:00:00 /opt/vmware/aam/bin/ftAgent -d vmware

main::wait_agent_startup:1309: cmd status was 0

issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'

CMD: Thu Apr 9 12:34:59 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"

RESULT:

-


Node Type State

-


-


-


vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running

main::issue_cli_cmd:1397: cmd status was 0

wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'

wait_agent_startup: elapsed time 0 minute(s) and 10 second(s)

VMwareresult=success

Total time for script to complete: 0 minute(s) and 34 second(s)

#

Tags (5)
0 Kudos
15 Replies
dominic7
Virtuoso
Virtuoso

What version VC are you running? All of the HA bits are distributed by VC,and you don't get the U4 bits unless you upgrade to VC U4. ESXi 3.5.0 U4 / VC U4 is the first version that HA appears to be working for us.

0 Kudos
dylanebner
Contributor
Contributor

VC is U4 as well, build 147633

0 Kudos
dominic7
Virtuoso
Virtuoso

Are you running embedded, or installable?

0 Kudos
dylanebner
Contributor
Contributor

Just esx enterprise 3.5.0, 153875

0 Kudos
depping
Leadership
Leadership

I would start with removing each host from the cluster and add it again. This fixes most issues 95% of the time.

Duncan

VMware Communities User Moderator

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

0 Kudos
dylanebner
Contributor
Contributor

That didn't work. Removed both and then re-added. used computer name to add the servers, not IP or FQDN.

Also, I am now getting a DRS error (General Server Error)

0 Kudos
depping
Leadership
Leadership

I had this once, with 3.0.2 and I needed to recreate my cluster. I created a new one and moved my hosts. still don't know what went wrong by the way or what caused it.

Duncan

VMware Communities User Moderator

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

0 Kudos
ratkinson
Contributor
Contributor

Try This

http://communities.vmware.com/thread/197334

Looks like he got the problem fixed.

but another thing you could check is to make sure the hosts are listed in both forward and reverse lookups zones.

vmkping and nslookup should both work before enabling HA

VM support is great also Smiley Happy

0 Kudos
dylanebner
Contributor
Contributor

OK, I got the DRS error to go away by running ervice mgmt-vmware-restart.

I then moved the VM's to a new cluster and enabled HA and DRS. HA worked for a minute and not both host are reporting an error.

The cluster is now reporting:

Insufficient Resources To Satisfy Configured Failover Level For HA

and

Unable to contact primary HA.

0 Kudos
ratkinson
Contributor
Contributor

If you only have 2 hosts. set the number of host failures the cluster can tolerate to "1"

How many VM's are you running, and will 1 host be sufficient to run them all?

I've never tested it before but my understanding of HA is that it will try and restart the VM's from the failed host on another physical host.

That host should have enough resources to handle the extra VM's or HA would fail.

0 Kudos
dylanebner
Contributor
Contributor

Ya, I think that is my problem. Each hosts have 12 GB of memory and 12Ghz of CPU. I have 19 Guests, and one of them has 2GB of memory. I don't think I have enough memory.

0 Kudos
Chamon
Commander
Commander

You may want to change the option under HA to... allow VMs to start even if it violates constraint .....

then under Virtual Machine options set the correct restart priority level for each VM.

If you do have a Host failure then you want to be sure that your higher priority VM are restarted and the lesser ones remain off if there are not enough resources for all of the VMs.

Message was edited by: Chamon

0 Kudos
chilow
Enthusiast
Enthusiast

Check this out this may help.....http://communities.vmware.com/message/1204244#1204244

-Chi

0 Kudos
dylanebner
Contributor
Contributor

I set the default restart priority on all the VM's to disabled. I am trying to see if I have a capacity issue or a config issue. HA still will not start. I get an insufficient resources error and an unable to contact primary HA agent in the cluster error.

I renamed /etc/opt/aam and then disabled/re-enabled HA and watched the FT_HOSTS file. On the second VM I saw the local hosts information added to the file, then the remote hosts info, and then local hosts information was removed from the file, leaving on the remote hosts info. On the first VM, I saw the remote hosts info added, then the remote hosts info removed and the local hosts info added, and the file deleted alltogether.

Any ideas?

If I set the default restart priority to disabled on all the VM's, shouldn't that mean that I am not using any of my slots?

0 Kudos
harunsahiner
Contributor
Contributor

You should not need to restart the host. Try disabling and re-enabling HA on the Cluster level

http://harunsahiner.blogspot.com
0 Kudos