VMware Cloud Community
elihuj
Enthusiast
Enthusiast

After HA Event, VM's are shutdown

We have been experiencing an ongoing issue where VM's are not migrated to a new host during an HA event. Here are our current cluster settings:

Admission Control is set to Disable

VM restart priority is set to Medium

Host Isolation response is set to Leave powered on

The hosts are 4.1, within a 5.0 vCenter. We do have enough failover capacity for a single host failure. The problem we have is if a host with 5 VM's has a host failure (hardware or other), 3 of the VM's will migrate to a new host without issue. The remaining 2 VM's do not migrate, and are left in a powered off state until they are manually powered back on.

Not sure if it is relevant or not, but one key difference in the VM's that do and do not failover are the ones that do not failover are configured to use RDM's.

Is there any reason why some VM's would migrate fine, while others will not? Is there anything we can check out? Thank you in advance.

Reply
0 Kudos
7 Replies
weinstein5
Immortal
Immortal

I am assuming it is the same two VMs that are not restarting on the other host?

If this is the case please check the configuration of the VMs making sure nothing is stored on the ESXi host or there is a network connection to a vmport group that is not on any other host -

even though vMotion has nothing to do with HA but are you able to Vmotion the VMs in question? Because vMotion like HA requires that ESXi hosts are configured identically and if there are any discrepancies you will receive a warning -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Reply
0 Kudos
elihuj
Enthusiast
Enthusiast

Thank you for the reply weinstein5. Yes, these are the same two VMs that are not restarting on the other host. I have verified that vMotion does work, as we use it for patching.

As for the configuration, I do not believe that anything is stored locally to a particular host. All hosts are configured identically otherwise. None of the hosts within the cluster are using vSS's. Networking is configured via a Cisco Nexus 1000V vDSS. I'm not sure where else I can look to check for anything locally stored on the host.

The only difference (that I can see) from the VMs that failover versus the one's that do not are the use of RDM's. Example configuration for VMs:

VM1:

4CPU/8GB Mem

2 HDD (67GB total)

VM2:

6CPU/38GB Mem

2 HDD (106GB total)

14 RDM (8TB total)

Additionally, do you think that this is something on the host entirely? Nothing on the VM itself would/could attribute to the VM not failing over?

Reply
0 Kudos
depping
Leadership
Leadership

goose117 wrote:

Not sure if it is relevant or not, but one key difference in the VM's that do and do not failover are the ones that do not failover are configured to use RDM's.

Is there any reason why some VM's would migrate fine, while others will not? Is there anything we can check out? Thank you in advance.

This should work just fine, assuming though that all storage devices are properly masked / zoned to all hosts. I would suspect that this is a hardware configuration issue.

Reply
0 Kudos
elihuj
Enthusiast
Enthusiast

Yes, all of hosts are zoned and masked correctly. All of our hosts are running on BL465c G7 blade servers. Where would you start checking for issues for hardware?

Reply
0 Kudos
depping
Leadership
Leadership

I would start with checking if the configuration is 100% consistent. I typically use RVTools for that (www.robware.net)

Reply
0 Kudos
depping
Leadership
Leadership

if that is the case then I would suggest creating a dummy virtual machine with RDMs, empty the host where this VM is running on and pull the cable from  the host to see what happens. Easiest way to test it without impact to the other VMs. Watch the FDM.Log files for specific error messages in terms of restarting.

Reply
0 Kudos
elihuj
Enthusiast
Enthusiast

Ah good point. We actually use RVTools to run weekly reports. I'll check the one from yesterday to verify consistency within the hosts. I think that's a great place to start. Thank you depping.

Reply
0 Kudos