VMware Cloud Community
mesteller
Contributor
Contributor

Vmware 4.1 HA and DRS

We are currently running HA and DRS we have configured HA Isolation Response to Leave Powered On - This places a Lock file on the VM'S running on the host - if the host fails hard down - HA will be unable to move these machines to another available host in the cluster. We considered changing this setting to shutdown which would gracefully shutdown the VMS and release the lock file allowing another host to boot the VMS.

We prefer to leave the setting set to Leave Powered On - my question is if we rebuild the failed machine on another Host, as the same name would HA allow us to start the failed VMS on the new host?

0 Kudos
6 Replies
FranckRookie
Leadership
Leadership

Hi Mesteller,

Isolation and host failure are two different problems.

You need to decide what to do in case one of a host becomes isolated. If you are absolutely sure that the VM network will still be available, then you can decide to keep your applications up as they can communicate with their clients.

If a host fails, even if it uses lock files, other hosts will be able to restart the failed VMs. Other hosts will survey file locks. If they are not touched during a few milliseconds then they consider the host failed and they try to restart the VMs.

You can't have two VMs with the same name in a vCenter.

Good luck.

Regards

Franck

0 Kudos
mesteller
Contributor
Contributor

Hi

So the definition of Isolation is unable to communicate with the cluster - meaning a loss of the network? We recently had a hard drive failure that caused the Linux (Vmware) kernel to stop working, all the Vms were still running on that host - in order to recover we replaced the drive and re-booted - we attempted to migrate the vms to another host and were unable because of the lock file - If I am understanding this correctly if we would have disabled the host's network connections HA would have migrated the VMS?

As for having two Host with the same name we would remove the host from inventory and then re-build - after thinking about this strategy it isn't a very good one - it would take longer to re-build than to

have the Isolation setting changed to shutdown.

Neal Mesteller

Sr Analyst, Distributed Systems LAN/PC

neal.mesteller@kennametal.com

T 724-539-5341

M 724-331-5990

F 724-539-5031

Kennametal Inc. | 1600 Technology Way | Latrobe, PA 15650 | www.kennametal.com

From: FranckRookie <communities-emailer@vmware.com>

To: <neal.mesteller@kennametal.com>

Date: 10/27/2010 03:45 PM

Subject: New message: "Vmware 4.1 HA and DRS"

0 Kudos
FranckRookie
Leadership
Leadership

Isolation consists in a host not being able to communicate with other members of the cluster through the admin network. The problem you had is different and very annoying.

The best solution would have been to move your running VMs to another host either manually one by one or asking the ESX to enter maintenance mode. But if the system console is crashed then there is a good chance that it will not accept the move request or react properly to an isolation event. So you have two possibilities:

- stop your VMs from the inside with an OS shutdown and then restart them on other hosts. Finally reboot or repair your host.

- make a hard shutdown of your host by unplugging the power cable. The VMs will be restarted on other hosts by the HA failure.

Useless to say that it is always better to close your applications neatly using the first solution...

Regards

Franck

0 Kudos
mesteller
Contributor
Contributor

Hi

We suspect the service console was crashed - we did attempt to enter maintenance mode but could not - I believe the option was greyed out - we attempted to gracefully shutdown the virtual machines running on that host then re-boot on a different host by removing form inventory then re-adding to inventory this still did not release the lock file and left the VM in an un-bootable state- having the isolation set to leave powered on and then forcing the host to fail by pulling the power would cause the running vms to power crash - and still not be re-started by HA - according to VmWare support we had to gracefully shutdown our virtual machines - re-place the failed drive then re-boot the host -

Our network is set up for HA - we have Mgmt network - vmotion - and production.

Reviewing these settings it appears their isn't a good option

Power Crash- would perform a virtual power down - crash the VMS os - hared on the VM - but this option will allow the VM to be booted on another host -

leave powered on - cluster can no longer monitor the Host or the Vm's - lock file issues cannot bring the vms up on another host

shut down - if VM fails to power down no way to force shutdown - if shutdown successful HA will work

What would be the best setting to use to minimize outages to our VM's

It appears that leave powered on is the best option

Neal Mesteller

Sr Analyst, Distributed Systems LAN/PC

neal.mesteller@kennametal.com

T 724-539-5341

M 724-331-5990

F 724-539-5031

Kennametal Inc. | 1600 Technology Way | Latrobe, PA 15650 | www.kennametal.com

From: FranckRookie <communities-emailer@vmware.com>

To: <neal.mesteller@kennametal.com>

Date: 10/28/2010 08:46 AM

Subject: New message: "Vmware 4.1 HA and DRS"

0 Kudos
ThompsG
Virtuoso
Virtuoso

Hi,

Sorry if this causes more confusion than helps.

I note that you tried to gracefully shutdown virtual machines by removing them from the inventory. This does not shutdown virtual machines, the only way to do this with an isolated host are as follows:

- via some sort of remote tools to the virtual machines, i.e. RDC if Windows servers (this assumes the virtual machine network is still up and running)

- via the Service Console (assumes you have console access if the Service console network is down)

- physically power the host off

Obviously the preference is to gracefully power the virtual machines off, however this is not always possible and if the virtual machines are still running but not contactable, often the business will dictate the nasty power off. Having said that most OS these days can recover from a power off after running some logs back or a chkdsk or two Smiley Wink

In regards to what is the best open for Host Isolation, the answer really depends on your environment. Personally I tend to prefer the Leave VMs powered on option as this allows me to determine if I actually have an isolated host that affects my virtual machines, or just a host that has no access to Service Console. If the host actually goes belly up, this option will still allow the virtual machines to start on another host. For example, lets say the vSwitch that hosts the Service Console loses connection to the network (e.g. both NICS die) however the vSwitch the virtual machines run from is fine, do you really want the virtual machines restarting during the day or would you rather control this to take the host down after hours?

Kind regards.

0 Kudos
mesteller
Contributor
Contributor

Hi

Yes we did try to gracefully shutdown from Remote desktop - sorry I did not make that clear - Anyway from what I am gathering it is better to leave the isolation setting set to leave powered on because of the way we have our network set up for HA.

thanks

Neal Mesteller

Sr Analyst, Distributed Systems LAN/PC

neal.mesteller@kennametal.com

T 724-539-5341

M 724-331-5990

F 724-539-5031

Kennametal Inc. | 1600 Technology Way | Latrobe, PA 15650 | www.kennametal.com

From: ThompsG <communities-emailer@vmware.com>

To: <neal.mesteller@kennametal.com>

Date: 10/29/2010 06:54 AM

Subject: New message: "Vmware 4.1 HA and DRS"

0 Kudos