VMware Cloud Community
rostler
Contributor
Contributor

VM behaviour using DRS

I have three ESX 3.1 hosts in a pilot DRS cluster with HA disabled. Each host has three physical NICs. Two NICs are teamed and configured for fail over on the VM network, the third handles the vmkernal and administration network.

We have two fault tolerant network switches, and for the live cluster we will have two NICs for the vmkernal network, so as one will go to each switch so as if we lose a switch, the hosts still remain connected.

In the current configuration though, when we lose the switch connected to the vmkernal network, the VMs get powered off, and as I said, this is with HA disabled, so the option to power VMs off when host is isolated doesn't apply.

To summerise, with HA disabled, if the host is isolated it still powers of VMs. This behaviour seems strange to me as I thought that these VMs should still be available.

If anyone could give me an explanation, I would be grateful.

Thanks

0 Kudos
7 Replies
peetz
Leadership
Leadership

Very strange... Did you have HA enabled before?

Maybe the hosts were not un-configured correctly when you disabled HA.

Try enabling and disabling HA on the cluster. Any error messages?

\- Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
rostler
Contributor
Contributor

Thanks for your reply Peetz.

Yes, we did have HA enabled before, and we had a problem with one of these switches. After the first time we lost the switch, we disabled HA in the belief that if we lost the switch again, we wouldn't lose the VMs. The switch is now fixed, but twice we lost the VMs after disabling HA. I have tried enabling HA and disabling it again.

Do you agree that this shouldn't happen?

0 Kudos
Gabrie1
Commander
Commander

Hi

I would agree this shouldn't happen. If HA is disabled, then there would be no agent that could detect a host isolation....

Gabrie

http://www.GabesVirtualWorld.com
peetz
Leadership
Leadership

Yes, I agree that this should not happen.

If you want to make sure that the HA agent is not running on the ESX hosts run a

ps ax | grep /opt/LGTOaam512/bin[/code]

in the service consoles. It sould output only the grep process itself, but no other.

If you see processes like

/opt/LGTOaam512/bin ftAgent -d vmware[/code]

this means that the HA agent is still running, and you could try

/etc/init.d/LGTOAAM51_vmware stop[/code]

to stop it.

\- Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
rostler
Contributor
Contributor

Thanks Andreas. I have tried this on all hosts, and consequently can confirm that HA is disabled.

I feel now that I need to simulate another failure and confirm that the issue of the VMs shutting down is still occurring. Not sure when I can do this as although it's a pilot, people still want to use it. You now how it is. Smiley Sad

Thanks very much for your help.

Rob

0 Kudos
acmcnick
Enthusiast
Enthusiast

In the short term since it is a pilot, add the third nics to the VMnetwork vSwitch. You can still specify that the VM network utilize VMNIC0 and VMNIC1, but you can also specify that in the event of a failure, the VMkernel NIC can use 0 or 1 as well.

Select the properties on the vSwitch, click on the Service you wish to modify and click Edit. Click NIC Teaming, Select the Override vSwitch failover order and specify your Active and Standby adapters.

Hope this helps.

0 Kudos
rostler
Contributor
Contributor

Thanks acmcnick. I must admit that I wasn't ware this could be done, and I will test it. I probably wont implement though in this pilot. I'm more interested in why the condition occurs when HA is disabled, and whether of not anyone else has experianced this.

Thanks

Rob

0 Kudos