VMware Cloud Community
karlmujicUGL
Contributor
Contributor
Jump to solution

All Paths Down But VMs Not Shutting Down

We are testing a failure of our storage system. We powered off both storage switches which stops all hosts in our cluster from accessing the storage. We have configured the "Response for Datastore with All Paths Down (APD)" to "Power off and restart VMs (aggressive)" as we want the VMs to power off in this situation. But we are getting the message: "vSphere HA did not terminate VM (vmname) affected by an inaccessible datastore on host (hostname) in cluster (clustername): not enough resources"


I thought our configuration would turn off the VM as it is set to aggressive. Or do I have this wrong?

Thank You,

Tags (1)
Reply
0 Kudos
1 Solution

Accepted Solutions
rcporto
Leadership
Leadership
Jump to solution

Since you write "We powered off both storage switches which stops all hosts in our cluster from accessing the storage", I believe there is no remaing host with access to the storage, right? This way, the virtual machines will not be restarted since there is no host available to run the virtual machine, and because of that you're getting the message: "vSphere HA did not terminate VM (vmname) affected by an inaccessible datastore on host (hostname) in cluster (clustername): not enough resources".

See the documentation: Datastore Inaccessibility Is Not Resolved for a VM

When a datastore becomes inaccessible, VMCP might not terminate and restart the affected virtual machines.

When an All Paths Down (APD) or Permanent Device Loss (PDL) failure occurs and a datastore becomes inaccessible, VMCP might not resolve the issue for the affected virtual machines.

In an APD or PDL failure situation, VMCP might not terminate a virtual machine for the following reasons:

VM is not protected by vSphere HA at the time of failure.

VMCP is disabled for this virtual machine.

Furthermore, if the failure is an APD, VMCP might not terminate a VM for several reasons:

APD failure is corrected before the VM was terminated.

Insufficient capacity on hosts with which the virtual machine is compatible

During a network partition or isolation, the host affected by the APD failure is not able to query the master host for available capacity. In such a case, vSphere HA defers to the user policy and terminates the VM if the VM Component Protection setting is aggressive.

vSphere HA terminates APD-affected VMs only after the following timeouts expire:

APD timeout (default 140 seconds).

APD failover delay (default 180 seconds). For faster recovery, this can be set to 0.

---

Richardson Porto
Senior Infrastructure Specialist
LinkedIn: http://linkedin.com/in/richardsonporto

View solution in original post

Reply
0 Kudos
2 Replies
rcporto
Leadership
Leadership
Jump to solution

Since you write "We powered off both storage switches which stops all hosts in our cluster from accessing the storage", I believe there is no remaing host with access to the storage, right? This way, the virtual machines will not be restarted since there is no host available to run the virtual machine, and because of that you're getting the message: "vSphere HA did not terminate VM (vmname) affected by an inaccessible datastore on host (hostname) in cluster (clustername): not enough resources".

See the documentation: Datastore Inaccessibility Is Not Resolved for a VM

When a datastore becomes inaccessible, VMCP might not terminate and restart the affected virtual machines.

When an All Paths Down (APD) or Permanent Device Loss (PDL) failure occurs and a datastore becomes inaccessible, VMCP might not resolve the issue for the affected virtual machines.

In an APD or PDL failure situation, VMCP might not terminate a virtual machine for the following reasons:

VM is not protected by vSphere HA at the time of failure.

VMCP is disabled for this virtual machine.

Furthermore, if the failure is an APD, VMCP might not terminate a VM for several reasons:

APD failure is corrected before the VM was terminated.

Insufficient capacity on hosts with which the virtual machine is compatible

During a network partition or isolation, the host affected by the APD failure is not able to query the master host for available capacity. In such a case, vSphere HA defers to the user policy and terminates the VM if the VM Component Protection setting is aggressive.

vSphere HA terminates APD-affected VMs only after the following timeouts expire:

APD timeout (default 140 seconds).

APD failover delay (default 180 seconds). For faster recovery, this can be set to 0.

---

Richardson Porto
Senior Infrastructure Specialist
LinkedIn: http://linkedin.com/in/richardsonporto
Reply
0 Kudos
karlmujicUGL
Contributor
Contributor
Jump to solution

Thanks so much Richardson! I've been looking all over the place to try and find a straight answer like that.

It all makes sense now.

Reply
0 Kudos