VMware Cloud Community
Gustavo_Souza_L
Contributor
Contributor

Configuring VMware HA

Hi,

I have been struggling to do my HA Cluster to work properly.

I deployed an environment with 2 VMware ESX 3 server, 1 FC storage and 1 Virtual Center Server.

I have created a Cluster and enabled only HA option and then put two hosts into the cluster, I didn't have problems to do it.

After I created a VM in one of the hosts, I've installed the Guest O.S. and leave the VM running.

To test HA Cluster I powered off the host that had a VM running and nothing happened, my host powered off and my VM too.

Please Can anyone help me???

Can anyone configure a cluster using only VMware HA option???

0 Kudos
18 Replies
wbednarzyk
Enthusiast
Enthusiast

Hi Seniord,

on your HA Cluster choose Edit Settings>VMware HA> Virtual Machine Options

inside this dialog box you should find a list of your VM's in the cluster and what will happen to their power state if their host becomes isolated from the HA cluster. I believe the default response is "Power Off" and if you would like the HA to power on the server on the cluster, you can use the drop down menu.

Please make sure to give me points if I was helpful Smiley Happy

Bill

0 Kudos
Cloneranger
Hot Shot
Hot Shot

HA will by default power off the VM if the host its on fails,

Its not like MSCS where you have two live machines,

You can configure HA to leave the VM on in the event of a host failure,

Its in the properties of the cluster.

0 Kudos
esiebert7625
Immortal
Immortal

Read through these guides on HA...

Automating High Availability (HA) Services with Vmware HA - http://www.vmware.com/pdf/vmware_ha_wp.pdf

Effective DRS and HA in Production - http://download3.vmware.com/vmworld/2006/tac9413.pdf

Knocking Out Downtime with Two Punches: VMotion & VMware HA - http://www.vmware-tsx.com/download.php?asset_id=45

A Practical Guide to HA - http://www.vmware-tsx.com/download.php?asset_id=29

0 Kudos
Gustavo_Souza_L
Contributor
Contributor

I did all these solution...and nothing happened yet....I read all this documents and others too but none gave me a solution.

Is it necessary to enable DRS together or not.....I'm beginning to believe it is necessary.

Someone of you can configure a HA Cluster without DRS????

0 Kudos
esiebert7625
Immortal
Immortal

No it is not necessary to enable DRS for HA to work. You can definitely use HA without DRS. Is your cluster admission control set to allow constraint violations? What is your current failover capacity?

Also see this thread...

Vmware HA with 2 ESX hosts - http://www.vmware.com/community/thread.jspa?messageID=605107&#605107

0 Kudos
Gustavo_Souza_L
Contributor
Contributor

No, my cluster admission Control doesn't allow violate constraints and number of host failures is 1.

I read this document and didn't solve my problem...

When I disconnect the host with VM from the network or when I power off it the second host doesn't restart the VM and I don't Know why...

All setting look right...

0 Kudos
esiebert7625
Immortal
Immortal

Try allowing constraint violatons and see if that works, you might also check the hostd & vmkernel log files on the ESX server fo rany errors.

You can check several log files on the ESX server based on the problem you are experiencing, these include:

o Vmkernel - /var/log/vmkernel – records activities related to the virtual machines and ESX server

o Vmkernel Warnings - /var/log/vmkwarning – records activities with the virtual machines

o Vmkernel Summary - /var/log/vmksummary - Used to determine uptime and availability statistics for ESX Server; human-readable summary found in /var/log/vmksummary.txt

o ESX Server host agent log - /var/log/vmware/hostd.log - Contains information on the agent that manages and configures the ESX Server host and its virtual machines (Search the file date/time stamps to find the log file it is currently outputting to.)

o Service Console - /var/log/messages - Contain all general log messages used to troubleshoot virtual machines or ESX Server

o Web Access - /var/log/vmware/webAccess - Records information on Web-based access to ESX Server

o Authentication log - /var/log/secure - Contains records of connections that require authentication, such as VMware daemons and actions initiated by the xinetd daemon.

o VirtualCenter agent - /var/log/vmware/vpx - Contains information on the agent that communicates with VirtualCenter

o Virtual Machines - The same directory as the affected virtual machine’s configuration files; named vmware.log - Contain information when a virtual machine crashes or ends abnormally

0 Kudos
Gustavo_Souza_L
Contributor
Contributor

I make other test disconnecting one host from the network and the HA agent powered off the VM that was turned on in this host but this VM wasn't started in the other host. I found this logs in /var/log/vmware/vpx

RESULT:

\----


vaspsrvexs01

CMD: /opt/LGTOaam512/bin/ft_gethostbyname vaspsrvexs01 |grep FAILED

RESULT:

\----


list_nodes

CMD: /opt/LGTOaam512/bin/ftcli -domain vmware -connect vaspsrvexs01 -port 8042 -timeout 60 -cmd listnodes

RESULT:

\----


\[Err:2] gethostbyname error: 2

\[Err:13004] Unable to convert proxy name into address.

CMD: /opt/LGTOaam512/bin/ftcli -domain vmware -connect vaspsrvesx02 -port 8042 -timeout 60 -cmd listnodes

RESULT:

\----


\[Err:2] gethostbyname error: 2

\[Err:13004] Unable to convert proxy name into address.

VMwarenodestatus=

Copying /opt/LGTOaam512/config/vmware-sites to /opt/LGTOaam512/log/aam_config_util_listnodes.log

VMwareresult=failure

Total time for script to complete: 0 minute(s) and 40 second(s)

\[2007-08-01 16:57:59.713 'App' 5098416 error] \[VpxaVMAP::SendNodeResourceInfo] Request to VMAP failed or timed out!!

\[2007-08-01 16:59:02.109 'App' 5098416 error] \[VpxaVMAP::Invoke] Command /usr/bin/perl /opt/LGTOaam512/vmware/aam_config_util.pl -z -cmd=listnodes -domain=vmware failed with error 1

\[2007-08-01 16:59:02.109 'App' 5098416 error] \[VpxaVMAP::Invoke] Command output:

CMD: hostname -s

I believe this logs is because this host was without network...

How the HA cluster start the VM in other host.. Is it necessary to configure a VM in the other host and link this VM with a vmdk extant???

For example

Host_01 Host_02

VM01 - on VM01 - off

\ /

\ /

\ /

linux.vmdk

0 Kudos
esiebert7625
Immortal
Immortal

HA is very dependent on DNS and Reverse DNS to function, this could very well be your problem if it is not setup correctly. Check your etc/ft_hosts file and make sure the IP and hostname is in there. Also make sure your host names are resolvable by each server. Also make sure your Configuration, DNS and Routing is setup correctly with hostnames, domains and DNS servers.

Another post mentions this:

One large caveat for VMware HA:

Be sure to add the host into Virtual Center using the Fully Qualified Domain Name. I.e. if the server name is foo.bar.com, normally you could refer to it as foo, but for VMware HA to work, foo.bar.com is the only name you can use when adding into VC.

In addition, be sure there is a properly configured DNS server that resolves foo.bar.com correctly.

These two items will make VMware HA work quite well.

0 Kudos
Gustavo_Souza_L
Contributor
Contributor

Hi...my HA continue doesn't work well.

I reviewed my DNS settings and everything is okay, my DNS is working well.

Could You explain how the HA start a vm in other Host???

Because...If I am not wrong, the HA cluster doesn't work with Vmotion, so how the vm is started from the other host.

0 Kudos
CXSANGUY
Enthusiast
Enthusiast

Are these VM's entirely on Shared Storage (SAN etc)? Can you successfully VMotion them from node to node when everything is up and running?

And to re-iterate a common misunderstanding that VMware HA is in no way a replacement for Microsoft Clustering. When HA is working, The VM will shutdown entirely and then restart at some point on the other VM node from scratch.

0 Kudos
esiebert7625
Immortal
Immortal

Correct, HA simply starts the VM on another physical server. It does not use vMotion for this but it does have some of the same requirements as vMotion like shared storage. Also make sure your vswitches and vnics are setup the same on both servers and you do not have your VM's CD-ROM mapped to anything.

Below is a doc I put together on this.

How does the HA (High Availability) feature work?

VMware HA continuously monitors all ESX Server hosts in a cluster and detects failures. An agent placed on each host maintains a "heartbeat" with the other hosts in the cluster and loss of a heartbeat initiates the process of restarting all affected virtual machines on other hosts. You create and manage clusters using VirtualCenter. The VirtualCenter Management Server places an agent on each host in the cluster so each host can communicate with other hosts to maintain state information and know what to do in case of another host's failure. (The VirtualCenter Management Server does not provide a single point of failure.) If the VirtualCenter Management Server host goes down, HA functionality changes as follows. HA clusters can still restart virtual machines on other hosts in case of failure; however, the information about what extra resources are available will be based on the state of the cluster before the VirtualCenter Management Server went down. HA monitors whether sufficient resources are available in the cluster at all times in order to be able to restart virtual machines on different physical host machines in the event of host failure. Safe restart of virtual machines is made possible by the locking technology in the ESX Server storage stack, which allows multiple ESX Servers to have access to the same virtual machines file simultaneously.

Host failure detection occurs 15 seconds after the HA service on a host has stopped sending heartbeats to the other hosts in the cluster. A host stops sending heartbeats if it is isolated from the network. At that time, other hosts in the cluster treat this host as failed, while this host declares itself as isolated from the network. By default, the isolated host powers off its virtual machines. These virtual machines can then successfully fail over to other hosts in the cluster. If the isolated host has SAN access, it retains the disk lock on the virtual machine files, and attempts to fail over the virtual machine to another host fails. The virtual machine continues to run on the isolated host. VMFS disk locking prevents simultaneous write operations to the virtual machine disk files and potential corruption.

If the network connection is restored before 12 seconds have elapsed, other hosts in the cluster will not treat this as a host failure. In addition, the host with the transient network connection problem does not declare itself isolated from the network and continues running. In the window between 12 and 14 seconds, the clustering service on the isolated host declares itself as isolated and starts powering off virtual machines with default isolation response settings. If the network connection is restored during that time, the virtual machine that had been powered off is not restarted on other hosts because the HA services on the other hosts do not consider this host as failed yet. As a result, if the network connection is restored in this window between 12 and 14 seconds after the host has lost connectivity, the virtual machines are powered off but not failed over.

For more information on HA see http://download3.vmware.com/vmworld/2006/tac9413.pdf and http://kb.vmware.com/KanisaPlatform/Publishing/894/2956923_f.SAL_Public.html and http://www.vmware.com/pdf/vmware_ha_wp.pdf

0 Kudos
Gustavo_Souza_L
Contributor
Contributor

Ok, if I understood I will need to verify if the cdrom are connected and disconnect it.

And I need to have two VM's, one under the Host1 and other under the Host2, but the both vm's must access the same vmdk, is it????

0 Kudos
esiebert7625
Immortal
Immortal

Nope, one VM on HostA that must reside on a shared storage datastore(SAN). This datastore must also be seen by HostB. If you disconnect the network cable for the Service Console on HostA then the VM should startup on HostB.

Did you try doing a vMotion as suggested above? Try this first and see if it comes back with an error.

0 Kudos
Gustavo_Souza_L
Contributor
Contributor

I tried to use Vmotion...and it didn't work well, because It use a Gigabit Network and I don't have a NIC gigabit into the hosts...I will try to frind a NIC giga and then test again...

So...for while

Thank you for your attention....

After testing I come back to post the solution and give your points....

Thank you...

0 Kudos
esiebert7625
Immortal
Immortal

It will work with 100Mbps, in a production environment you typically want gigabit though for best results.

http://www.vmware.com/community/thread.jspa?messageID=586795&#586795

0 Kudos
chukarma
Enthusiast
Enthusiast

One of the thing that I found helpful when I was having trouble with my HA settings was this. HA required the VMHOSTs to be able to talk to each via shortnames. I wasn't able to turn on HA even with the proper DNS records and FQDN. After creating static WINS records (from the WINS server) for the two hosts to be HA, everything worked.

0 Kudos
admin
Immortal
Immortal

I noticed the same thing with hostname resolution . . .

0 Kudos