I have two ESX servers that I cannot get HA working on. We have just upgraded from ESX 3.5 U1 thinking this might help, but it hasn't. We have always had trouble getting HA running, even before the upgrade. I have looked at the DNS, and everything looks ok. I also added the host names to each servers host file.
When we enable HA, the primary server completes and then the second server sits at 60% for a while and then moves up to 83% and then 9x% and then fails. Shortly after the second server fails, the primary also reports and error.
Here are some logs:
ADDNODE log from second server:
# cat aam_config_util_addnode.log
KEY: shortname VAL: VRC-E31-25-D1955-B01-VI3-VM02
KEY: cmd VAL: addnode
KEY: primarynet VAL: 10.5.5.130/255.255.255.0:
KEY: hostnet VAL: 10.5.5.131/255.255.255.0
KEY: domain VAL: vmware
KEY: primaryagent VAL: vrc-e30-17-d1955-b09-vi3-vm01
KEY: iso VAL: 10.5.5.1
KEY: -z VAL: 1
add_aam_node
CMD: Thu Apr 9 12:35:06 2009 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run
RESULT:
-
main::write_run_script:1129: cmd status was 0
CMD: Thu Apr 9 12:35:06 2009 cp /opt/vmware/aam/bin/generateConfigBackup.pl /opt/vmware/aam/bin/runOnce
RESULT:
-
main::write_run_script:1133: cmd status was 0
setup_ft_hosts adding host vrc-e30-17-d1955-b09-vi3-vm01 IP info: 10.5.5.130/255.255.255.0:CMD: Thu Apr 9 12:35:06 2009 hostname -s
RESULT:
-
VRC-E31-25-D1955-B01-VI3-VM02
main::verify_network_configuration:1154: cmd status was 0
CMD: Thu Apr 9 12:35:06 2009 /opt/vmware/aam/bin/ft_gethostbyname VRC-E31-25-D1955-B01-VI3-VM02 |grep FAILED
RESULT:
-
main::verify_network_configuration:1154: cmd status was 1
CMD: Thu Apr 9 12:35:06 2009 /opt/vmware/aam/bin/ft_gethostbyname vrc-e30-17-d1955-b09-vi3-vm01 |grep FAILED
RESULT:
-
main::verify_network_configuration:1154: cmd status was 1
CMD: Thu Apr 9 12:35:06 2009 /usr/sbin/esxcfg-vswif -l
RESULT:
-
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console 10.5.5.131 255.255.255.0 10.5.5.255 true false
main::verify_network_configuration:1154: cmd status was 0
CMD: Thu Apr 9 12:35:06 2009 cp /opt/vmware/aam/ha/vmware_subsequent_node.pl /opt/vmware/aam/bin/runOnce
RESULT:
-
main::write_run_script:1155: cmd status was 0
ports 8042-8045 are free for use.
CMD: Thu Apr 9 12:35:06 2009 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=vrc-e31-25-d1955-b01-vi3-vm02 -port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=vrc-e30-17-d1955-b09-vi3-vm01
RESULT:
-
AAM setup script.
Setting environment from /opt/vmware/aam/config/agent_env.Linux
Setting up the AAM agent for domain vmware
Welcome to VMware HA Agent. (Release 5.1 )
Configuring Agent for current node: vrc-e31-25-d1955-b01-vi3-vm02
Configuration requires the node name of a primary agent. If you
are configuring the first node in the domain, enter the name
of this node. (i.e. vrc-e31-25-d1955-b01-vi3-vm02) If this is a subsequent installation
enter the name of an existing primary agent node.
Enter the name of a Primary Agent Node :
Using input argument of vrc-e30-17-d1955-b09-vi3-vm01 for Primary Agent
Agents require the use of 4 network ports through which to
communicate. These port numbers must be available and consistent
across each of the nodes in the domain. If you are unsure about
specifying port numbers or defining primary nodes please read the
appropriate sections of the user documentation provided with this
product.
Specify the first of the 4 port numbers:
Using argument for port1: 8042
Ports 8042, 8043, 8044 and 8045 will be used.
Installation for this node is complete.
To start the Agent run the "ft_startup" command.
main::add_aam_node:145: cmd status was 0
CMD: Thu Apr 9 12:35:06 2009 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck
RESULT:
-
main::edit_ftbb_prm_file:1252: cmd status was 0
CMD: Thu Apr 9 12:35:06 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out
RESULT:
-
main::ft_startup_monitor:1256: cmd status was 0
Waiting for /opt/vmware/aam/bin/ft_startup to complete
ft_startup_monitor: elapsed time 0 minute(s) and 3 second(s)
CMD: Thu Apr 9 12:35:12 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out
RESULT:
-
main::ft_startup_monitor:1269: cmd status was 0
Waiting for /opt/vmware/aam/bin/ft_startup to complete
ft_startup_monitor: elapsed time 0 minute(s) and 19 second(s)
active_primary_ftcli: active primary is ''
active_primary_ftcli: command is 'stop VMap_vrc-e31-25-d1955-b01-vi3-vm02'
find_active_primary: attempting to find an active primary.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -cmd "la -l"'
CMD: Thu Apr 9 12:35:34 2009 /opt/vmware/aam/bin/ftcli -domain vmware -cmd "la -l"
RESULT:
-
Node Agent Process Mon Rule Int
-
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 (1/16:6001.0) (1/16:6046.0) (1/16:6056.0)
Node vrc-e30-17-d1955-b09-vi3-vm01 (1/16:6056.0) is the active interpreter
main::issue_cli_cmd:2144: cmd status was 0
find_active_primary: active primary has been found: 'vrc-e30-17-d1955-b09-vi3-vm01'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "stop VMap_vrc-e31-25-d1955-b01-vi3-vm02"'
CMD: Thu Apr 9 12:35:34 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "stop VMap_vrc-e31-25-d1955-b01-vi3-vm02"
RESULT:
-
main::issue_cli_cmd:2241: cmd status was 1
active_primary_ftcli: command did not run successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'
active_primary_ftcli: command is 'forceProcState VMap_vrc-e31-25-d1955-b01-vi3-vm02 stopped'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "forceProcState VMap_vrc-e31-25-d1955-b01-vi3-vm02 stopped"'
CMD: Thu Apr 9 12:35:35 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "forceProcState VMap_vrc-e31-25-d1955-b01-vi3-vm02 stopped"
RESULT:
-
main::issue_cli_cmd:2241: cmd status was 1
active_primary_ftcli: command did not run successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'
active_primary_ftcli: command is 'deleteproc VMap_vrc-e31-25-d1955-b01-vi3-vm02'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "deleteproc VMap_vrc-e31-25-d1955-b01-vi3-vm02"'
CMD: Thu Apr 9 12:35:36 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "deleteproc VMap_vrc-e31-25-d1955-b01-vi3-vm02"
RESULT:
-
main::issue_cli_cmd:2241: cmd status was 1
active_primary_ftcli: command did not run successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
CMD: Thu Apr 9 12:35:38 2009 /opt/vmware/aam/bin/ft_shutdown -b -ppid=5611
RESULT:
-
AAM setup script.
Setting environment from /opt/vmware/aam/config/agent_env.Linux
Shutting down the AAM agent for domain vmware
Shutting down the Agent...
Shutting down the Backbone...
Skipping /usr/bin/perl (5611) because it is a parent script.
Skipping /opt/vmware/aam/bin/ftPerl (5835) because it's this script.
Killing PIDs:
After kills issued:
PID: PID PROC: CMD
PID: 1 PROC: init
PID: 1309 PROC: syslogd
PID: 1313 PROC: klogd
PID: 1470 PROC: /opt/Navisphere/bin/naviagent
PID: 1619 PROC: /usr/sbin/snmpd
PID: 1628 PROC: /usr/sbin/sshd
PID: 1682 PROC: /usr/sbin/vmklogger
PID: 1729 PROC: xinetd
PID: 1744 PROC: ntpd
PID: 1753 PROC: gpm
PID: 1776 PROC: /bin/sh
PID: 1783 PROC: /usr/lib/vmware/webAccess/java/jre1.5.0_16/bin/webAccess
PID: 1792 PROC: crond
PID: 1808 PROC: /opt/dell/srvadmin/oma/bin/dsm_om_shrsvc32d
PID: 2462 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d
PID: 2667 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_datamgr32d
PID: 2688 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_eventmgr32d
PID: 2699 PROC: /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d
PID: 2728 PROC: /opt/dell/srvadmin/iws/bin/linux/dsm_om_connsvc32d
PID: 2729 PROC: /opt/dell/srvadmin/iws/bin/linux/dsm_om_connsvc32d
PID: 2741 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 2761 PROC: /bin/sh
PID: 2765 PROC: logger
PID: 2785 PROC: /usr/lib/vmware/hostd/vmware-hostd
PID: 3406 PROC: /bin/sh
PID: 3412 PROC: /var/pegasus/bin/cimserver
PID: 3447 PROC: /bin/sh
PID: 3456 PROC: /opt/vmware/vpxa/vpx/vpxa
PID: 3484 PROC: /bin/sh
PID: 3490 PROC: /sbin/openwsmand
PID: 3494 PROC: /sbin/mingetty
PID: 3495 PROC: /sbin/mingetty
PID: 3496 PROC: /sbin/mingetty
PID: 3497 PROC: /sbin/mingetty
PID: 3498 PROC: /sbin/mingetty
PID: 3499 PROC: /sbin/mingetty
PID: 3662 PROC: /var/pegasus/bin/cimserver
PID: 3669 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3671 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3676 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3684 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3691 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3695 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3704 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3712 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3723 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3725 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 3730 PROC: /usr/lib/vmware/bin/vmkload_app
PID: 5033 PROC: sshd:
PID: 5035 PROC: sshd:
PID: 5036 PROC: -bash
PID: 5070 PROC: su
PID: 5071 PROC: -bash
PID: 5611 PROC: /usr/bin/perl
PID: 5835 PROC: /opt/vmware/aam/bin/ftPerl
PID: 5858 PROC: ps
main::stop_aam:2077: cmd status was 0
kill_aam: copying /etc/opt/vmware/aam/vmware-sites to /var/log/vmware/aam/aam_config_util_addnode.log
FULLTIME_SITES_TID 00000002
+ 1:8042,8042,8043 vrc-e30-17-d1955-b09-vi3-vm01 vmware #FT_Agent_Port=8045
+ 2:8042,8042,8043 vrc-e31-25-d1955-b01-vi3-vm02 vmware
myexit: copying /etc/opt/vmware/aam/vmware-sites to /var/log/vmware/aam/aam_config_util_addnode.log
FULLTIME_SITES_TID 00000002
+ 1:8042,8042,8043 vrc-e30-17-d1955-b09-vi3-vm01 vmware #FT_Agent_Port=8045
+ 2:8042,8042,8043 vrc-e31-25-d1955-b01-vi3-vm02 vmware
VMwareresult=failure
Total time for script to complete: 0 minute(s) and 32 second(s)
ADDNODE from first server:
# cat aam_config_util_addnode.log | more
KEY: shortname VAL: VRC-E30-17-D1955-B09-VI3-VM01
KEY: domain VAL: vmware
KEY: cmd VAL: addnode
KEY: iso VAL: 10.5.5.1
KEY: -z VAL: 1
KEY: hostnet VAL: 10.5.5.130/255.255.255.0
add_aam_node
CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run
RESULT:
-
main::write_run_script:1129: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/bin/generateConfigBackup.pl /opt/vmware/aam/bin/runOnce
RESULT:
-
main::write_run_script:1133: cmd status was 0
add_aam_node: this is the primary agent -- 1st node in cluster.
add_aam_node: primary agent: vrc-e30-17-d1955-b09-vi3-vm01
CMD: Thu Apr 9 12:34:25 2009 rm -f /etc/opt/vmware/aam/FT_HOSTS
RESULT:
-
main::add_aam_node:145: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/vmware_first_node.pl /opt/vmware/aam/bin/runOnce
RESULT:
-
main::write_run_script:1148: cmd status was 0
ports 8042-8045 are free for use.
CMD: Thu Apr 9 12:34:25 2009 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=vrc-e30-17-d1955-b09-vi3-vm01
-port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=vrc-e30-17-d1955-b09-vi3-vm01
RESULT:
-
AAM setup script.
Setting environment from /opt/vmware/aam/config/agent_env.Linux
Setting up the AAM agent for domain vmware
Welcome to VMware HA Agent. (Release 5.1 )
Configuring Agent for current node: vrc-e30-17-d1955-b09-vi3-vm01
Configuration requires the node name of a primary agent. If you
are configuring the first node in the domain, enter the name
of this node. (i.e. vrc-e30-17-d1955-b09-vi3-vm01) If this is a subsequent installation
enter the name of an existing primary agent node.
Enter the name of a Primary Agent Node :
Using input argument of vrc-e30-17-d1955-b09-vi3-vm01 for Primary Agent
Performing a primary node configuration.
Agents require the use of 4 network ports through which to
communicate. These port numbers must be available and consistent
across each of the nodes in the domain. If you are unsure about
specifying port numbers or defining primary nodes please read the
appropriate sections of the user documentation provided with this
product.
Specify the first of the 4 port numbers:
Using argument for port1: 8042
Ports 8042, 8043, 8044 and 8045 will be used.
Database from previous installation has been renamed.
Installation for this node is complete.
To start the Agent run the "ft_startup" command.
main::add_aam_node:145: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck
RESULT:
-
main::edit_ftbb_prm_file:1252: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out
RESULT:
-
main::ft_startup_monitor:1256: cmd status was 0
Waiting for /opt/vmware/aam/bin/ft_startup to complete
ft_startup_monitor: elapsed time 0 minute(s) and 18 second(s)
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "
listnodes"'
CMD: Thu Apr 9 12:34:46 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm
d "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 0 second(s)
CMD: Thu Apr 9 12:34:46 2009 /bin/ping -c 1 10.5.5.1
RESULT:
-
PING 10.5.5.1 (10.5.5.1) 56(84) bytes of data.
64 bytes from 10.5.5.1: icmp_seq=0 ttl=255 time=0.593 ms
--- 10.5.5.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.593/0.593/0.593/0.000 ms, pipe 2
main::configure_ips:1306: cmd status was 0
active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'
active_primary_ftcli: command is 'import /var/log/vmware/aam/aam_config_util.def -skipfail=false'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "
import /var/log/vmware/aam/aam_config_util.def -skipfail=false"'
CMD: Thu Apr 9 12:34:47 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm
d "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"
RESULT:
-
Node Fd Settings "vrc-e30-17-d1955-b09-vi3-vm01" Modified
OK
main::issue_cli_cmd:2241: cmd status was 0
active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "
listnodes"'
CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm
d "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 1 second(s)
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"'
CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"
RESULT:
-
-
-
-
RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01
VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01
main::issue_cli_cmd:992: cmd status was 0
CMD: Thu Apr 9 12:34:48 2009 /bin/echo " -
-
-
RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01
VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01
" | grep -i "VMWareClusterManager" | grep -i "enable" >> /dev/null
RESULT:
-
main::configure_ips:1306: cmd status was 0
active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'
active_primary_ftcli: command is 'fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "
fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"'
CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm
d "fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"
RESULT:
-
OK
main::issue_cli_cmd:2241: cmd status was 0
active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "
listnodes"'
CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm
d "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
CMD: Thu Apr 9 12:34:59 2009 ps -ef | grep -v grep | /bin/egrep -i "ftAgent|ft_startup"
RESULT:
-
root 6001 1 0 12:34 ? 00:00:00 /opt/vmware/aam/bin/ftAgent -d vmware
main::wait_agent_startup:1309: cmd status was 0
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "
listnodes"'
CMD: Thu Apr 9 12:34:59 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cm
d "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 10 second(s)
VMwareresult=success
Total time for script to complete: 0 minute(s) and 34 second(s)
# cat aam_config_util_addnode.log
KEY: shortname VAL: VRC-E30-17-D1955-B09-VI3-VM01
KEY: domain VAL: vmware
KEY: cmd VAL: addnode
KEY: iso VAL: 10.5.5.1
KEY: -z VAL: 1
KEY: hostnet VAL: 10.5.5.130/255.255.255.0
add_aam_node
CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/store_nic_info.pl /opt/vmware/aam/bin/run
RESULT:
-
main::write_run_script:1129: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/bin/generateConfigBackup.pl /opt/vmware/aam/bin/runOnce
RESULT:
-
main::write_run_script:1133: cmd status was 0
add_aam_node: this is the primary agent -- 1st node in cluster.
add_aam_node: primary agent: vrc-e30-17-d1955-b09-vi3-vm01
CMD: Thu Apr 9 12:34:25 2009 rm -f /etc/opt/vmware/aam/FT_HOSTS
RESULT:
-
main::add_aam_node:145: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 cp /opt/vmware/aam/ha/vmware_first_node.pl /opt/vmware/aam/bin/runOnce
RESULT:
-
main::write_run_script:1148: cmd status was 0
ports 8042-8045 are free for use.
CMD: Thu Apr 9 12:34:25 2009 /opt/vmware/aam/bin/ft_setup -domain=vmware -upgrade=n -noprompt=y -hostname=vrc-e30-17-d1955-b09-vi3-vm01 -port1=8042 -licensekey=AMCFNEET-4YRDDN53CTHMBDSJ -mailserver=none -primaryagent=vrc-e30-17-d1955-b09-vi3-vm01
RESULT:
-
AAM setup script.
Setting environment from /opt/vmware/aam/config/agent_env.Linux
Setting up the AAM agent for domain vmware
Welcome to VMware HA Agent. (Release 5.1 )
Configuring Agent for current node: vrc-e30-17-d1955-b09-vi3-vm01
Configuration requires the node name of a primary agent. If you
are configuring the first node in the domain, enter the name
of this node. (i.e. vrc-e30-17-d1955-b09-vi3-vm01) If this is a subsequent installation
enter the name of an existing primary agent node.
Enter the name of a Primary Agent Node :
Using input argument of vrc-e30-17-d1955-b09-vi3-vm01 for Primary Agent
Performing a primary node configuration.
Agents require the use of 4 network ports through which to
communicate. These port numbers must be available and consistent
across each of the nodes in the domain. If you are unsure about
specifying port numbers or defining primary nodes please read the
appropriate sections of the user documentation provided with this
product.
Specify the first of the 4 port numbers:
Using argument for port1: 8042
Ports 8042, 8043, 8044 and 8045 will be used.
Database from previous installation has been renamed.
Installation for this node is complete.
To start the Agent run the "ft_startup" command.
main::add_aam_node:145: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 cp -f /etc/opt/vmware/aam/ftbb.prm /etc/opt/vmware/aam/ftbb.prm.bck
RESULT:
-
main::edit_ftbb_prm_file:1252: cmd status was 0
CMD: Thu Apr 9 12:34:25 2009 /bin/rm -f /var/log/vmware/aam/startAam.txt /var/log/vmware/aam/startAam.out
RESULT:
-
main::ft_startup_monitor:1256: cmd status was 0
Waiting for /opt/vmware/aam/bin/ft_startup to complete
ft_startup_monitor: elapsed time 0 minute(s) and 18 second(s)
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Thu Apr 9 12:34:46 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 0 second(s)
CMD: Thu Apr 9 12:34:46 2009 /bin/ping -c 1 10.5.5.1
RESULT:
-
PING 10.5.5.1 (10.5.5.1) 56(84) bytes of data.
64 bytes from 10.5.5.1: icmp_seq=0 ttl=255 time=0.593 ms
--- 10.5.5.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.593/0.593/0.593/0.000 ms, pipe 2
main::configure_ips:1306: cmd status was 0
active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'
active_primary_ftcli: command is 'import /var/log/vmware/aam/aam_config_util.def -skipfail=false'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"'
CMD: Thu Apr 9 12:34:47 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "import /var/log/vmware/aam/aam_config_util.def -skipfail=false"
RESULT:
-
Node Fd Settings "vrc-e30-17-d1955-b09-vi3-vm01" Modified
OK
main::issue_cli_cmd:2241: cmd status was 0
active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 1 second(s)
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"'
CMD: Thu Apr 9 12:34:48 2009 /opt/vmware/aam/bin/ftcli -domain vmware -timeout 60 -cmd "listrules"
RESULT:
-
-
-
-
RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01
VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01
main::issue_cli_cmd:992: cmd status was 0
CMD: Thu Apr 9 12:34:48 2009 /bin/echo " -
-
-
RuleMonitor enabled vrc-e30-17-d1955-b09-vi3-vm01
VMWareClusterManager enabled vrc-e30-17-d1955-b09-vi3-vm01
" | grep -i "VMWareClusterManager" | grep -i "enable" >> /dev/null
RESULT:
-
main::configure_ips:1306: cmd status was 0
active_primary_ftcli: active primary is 'vrc-e30-17-d1955-b09-vi3-vm01'
active_primary_ftcli: command is 'fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01'
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"'
CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "fireOnDemandTrigger SetP2P on 0 ADD#vrc-e30-17-d1955-b09-vi3-vm01"
RESULT:
-
OK
main::issue_cli_cmd:2241: cmd status was 0
active_primary_ftcli: command ran successfully on 'vrc-e30-17-d1955-b09-vi3-vm01'.
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Thu Apr 9 12:34:49 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
CMD: Thu Apr 9 12:34:59 2009 ps -ef | grep -v grep | /bin/egrep -i "ftAgent|ft_startup"
RESULT:
-
root 6001 1 0 12:34 ? 00:00:00 /opt/vmware/aam/bin/ftAgent -d vmware
main::wait_agent_startup:1309: cmd status was 0
issue_cli_cmd: command is '/opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"'
CMD: Thu Apr 9 12:34:59 2009 /opt/vmware/aam/bin/ftcli -domain vmware -connect vrc-e30-17-d1955-b09-vi3-vm01 -port 8042 -timeout 60 -cmd "listnodes"
RESULT:
-
Node Type State
-
-
-
vrc-e30-17-d1955-b09-vi3-vm01 Primary Agent Running
main::issue_cli_cmd:1397: cmd status was 0
wait_agent_startup: waiting for agent 'vrc-e30-17-d1955-b09-vi3-vm01' to come alive, status is : 'running'
wait_agent_startup: elapsed time 0 minute(s) and 10 second(s)
VMwareresult=success
Total time for script to complete: 0 minute(s) and 34 second(s)
What version VC are you running? All of the HA bits are distributed by VC,and you don't get the U4 bits unless you upgrade to VC U4. ESXi 3.5.0 U4 / VC U4 is the first version that HA appears to be working for us.
VC is U4 as well, build 147633
Are you running embedded, or installable?
Just esx enterprise 3.5.0, 153875
That didn't work. Removed both and then re-added. used computer name to add the servers, not IP or FQDN.
Also, I am now getting a DRS error (General Server Error)
I had this once, with 3.0.2 and I needed to recreate my cluster. I created a new one and moved my hosts. still don't know what went wrong by the way or what caused it.
Duncan
VMware Communities User Moderator
-
If you find this information useful, please award points for "correct" or "helpful".
Try This
http://communities.vmware.com/thread/197334
Looks like he got the problem fixed.
but another thing you could check is to make sure the hosts are listed in both forward and reverse lookups zones.
vmkping and nslookup should both work before enabling HA
VM support is great also
OK, I got the DRS error to go away by running ervice mgmt-vmware-restart.
I then moved the VM's to a new cluster and enabled HA and DRS. HA worked for a minute and not both host are reporting an error.
The cluster is now reporting:
Insufficient Resources To Satisfy Configured Failover Level For HA
and
Unable to contact primary HA.
If you only have 2 hosts. set the number of host failures the cluster can tolerate to "1"
How many VM's are you running, and will 1 host be sufficient to run them all?
I've never tested it before but my understanding of HA is that it will try and restart the VM's from the failed host on another physical host.
That host should have enough resources to handle the extra VM's or HA would fail.
Ya, I think that is my problem. Each hosts have 12 GB of memory and 12Ghz of CPU. I have 19 Guests, and one of them has 2GB of memory. I don't think I have enough memory.
You may want to change the option under HA to... allow VMs to start even if it violates constraint .....
then under Virtual Machine options set the correct restart priority level for each VM.
If you do have a Host failure then you want to be sure that your higher priority VM are restarted and the lesser ones remain off if there are not enough resources for all of the VMs.
Message was edited by: Chamon
Check this out this may help.....http://communities.vmware.com/message/1204244#1204244
-Chi
I set the default restart priority on all the VM's to disabled. I am trying to see if I have a capacity issue or a config issue. HA still will not start. I get an insufficient resources error and an unable to contact primary HA agent in the cluster error.
I renamed /etc/opt/aam and then disabled/re-enabled HA and watched the FT_HOSTS file. On the second VM I saw the local hosts information added to the file, then the remote hosts info, and then local hosts information was removed from the file, leaving on the remote hosts info. On the first VM, I saw the remote hosts info added, then the remote hosts info removed and the local hosts info added, and the file deleted alltogether.
Any ideas?
If I set the default restart priority to disabled on all the VM's, shouldn't that mean that I am not using any of my slots?
You should not need to restart the host. Try disabling and re-enabling HA on the Cluster level