VMware Cloud Community
Dthompson04
Contributor
Contributor

VCSA 6.7 HA or VMotion for VM's between ESXi HOSTS

Everything was up and running with VMotion or HA tested, but now it's not working.  When I try to research troubleshooting VMotion all I get is VMotion between vCenters or VCSA's and not between ESXi hosts with a single VCSA.

We have a single VSCA virtual appliance managing two ESXi hosts.  I've testing shutting a host down and moving the VMs over the the active host with no problem and no loss of operation to the VM.  Two days ago one of the hosts lost power and we lost all VMs on that device.

Since the installation over a year ago there's been a new team come in that will be maintaining the network in the future.  They've done some work to make all servers use Windows Credentials and I don't know what other changes were made.

When I talked to the new guy about this and told him I had tested HA and VMotion in the past he said I probably tested by powering the server host down and not pulling the plugs.  I thought I had did both, but again it was over a year ago.  Is there a difference between powering down and pulling the plug?
  I'm wondering if they disabled or messed up HA/VMotion, but can't find the troubleshooting procedures for single VCSA with two hosts.

When I first configured/installed the cluster I remember when configuring VMotion it would create a partition on each host for the other or something like that.  I took this as that's where the opposite host data was stored.

I maintained ESXi 5.5 and 6.0 on single hosts without vCenter/VCSA and redundancy so this is the first time I've had a problem.  If they didn't touch it or had at least coordinated the modifications with me I might know what they did.

I think the troubleshooting areas are "cross host vmotion" and "cross host storage vmotion".  I just search for these phrases, but it comes back vcenter to vcenter.

The headaches of turning over things to a new team....  Now everyone is looking to me to fix it.

Where do I find the troubleshooting procedures to review to correct the problem?  I've got a 10GB link between hosts for HA/VMotion, a separate simple switch and a /30 for L3.  All that is in place, but configurations may be different.

Would appreciate any help.

Reply
0 Kudos
2 Replies
Dthompson04
Contributor
Contributor

One update to the configuration.  We are running with the Platform Services Controller as Embedded.

Reply
0 Kudos
ALF4
Contributor
Contributor

If the power was removed/lost from the host then vMotion would not be used in that instance. Instead the VM's would be started up on the other host(s) as per your "host failure response" settings in vSphere Availability. If this setting is set to "disabled" then this would explain why when your host lost power you also lost all of the VM's attached to the host.

If an ESXi host loses power any VM's on that Host would be restarted on the other host(s) in the cluster (Are the hosts clustered? Have you enabled vSphere Availablilty and vSphere DRS?) as long as there is enough resource and if both hosts are using shared storage. It could be that it wasn't possible to start any vm's on the remaining host due to the lack of available resources if you had "host failure reponse" configured. However without seeing any logs it would be difficult to troubleshoot this issue.

By shutting down the host in vcentre you allow it to failover any VM's to the other host(s) before it is shutdown. This is not the same a total power loss of a host which gives the host no time to vMotion any VM's.

I cant imagine how creating a new identify source for active directory in vCenter would cause any issues with vMotion or HA.

VCP-DCV2019
Reply
0 Kudos