VMware Cloud Community
fsckit
Enthusiast
Enthusiast
Jump to solution

alarm not triggered: Insufficient vSphere HA failover resources

Does this alarm only get triggered when an HA action fails?  I have a cluster of two ESXi 5.0 hosts, and since I am using way more than 50% of the available memory on each host, to me it looks like I cannot tolerate a single host failure.

RAMdistChart.PNG

HA and DRS are enabled on the cluster, and in "Admission Control Policy" I have "Percent of cluster resources reserved as failover" set at 50%.

So why no alarm?  My goal is to get an alert before we reach the state we're in now, where we apparently cannot tolerate a host failure.

0 Kudos
1 Solution

Accepted Solutions
vbrowncoat
Expert
Expert
Jump to solution

>Based on this, and the lack of the alarm, we can assume that if I brought down one of the hosts in this 2-host cluster, all the VMs would be able to start on the single remaining host, correct?

Correct, if one of your hosts crashed or was reset all your VMs would have resources enough to start (no guarantee of performance, just starting/running)


>And the alarm I refer to in this thread's title will only get triggered when that 98% goes to 50% or below, correct?

There is no alarm. When your CPU or Memory failover capacity reaches 50% admission control will prevent you from powering on another VM


>I am still concerned about that resource distribution chart for Memory, though. It looks like I could push it up to 100% on both hosts, and still not push memory failover capacity lower than 50%. I presume this is due to the fact that my VMs don't have reserved memory, so vSphere only counts the minimal amount of memory required to start the VM, and it will depend on swapping and ballooning if all the VMs actually start using all their memory.

Correct


>So the alarm I'm looking for is one that replicates the resource distribution chart, and alerts when the total unutilized memory in the cluster is less than the total memory of a single host.  Make sense?

I'm not aware of an alarm like this. You may be able to find something through google or create something yourself. What about creating reservations for your VMs? Or at least the important ones? That way you can make sure they'll get the resources when they restart. Or if you really wanted you could change the admission control to the dedicated failover host policy? (with what I know of your situation I don't think I'd recommend this, but it would give you what you are asking for).


Have you looked at the vRAS fling? VM Resource and Availability Service – VMware Labs

This Fling enables you to perform a what-if analysis for host failures on your infrastructure. You can simulate failure of one or more hosts from a cluster (in vSphere) and identify how many:

  • VMs would be safely restarted on different hosts
  • VMs would fail to be restarted on different hosts
  • VMs would experience performance degradation after restarted on a different host

With this information, you can better plan the placement and configuration of your infrastructure to reduce downtime of your VMs/Services in case of host failures.


View solution in original post

0 Kudos
5 Replies
vbrowncoat
Expert
Expert
Jump to solution

HA admission control ensures that you have sufficient resources to meet the reservations of any VMs that have them, and the minimum resources to start the VMs if they don't have a reservation. So if you look at the total used resources in your cluster you are getting an inaccurate picture from an admission control standpoint. Admission Control isn't an alarm, it just won't let you power on VMs after you go past the limit.

HA Admission Control doesn't ensure that all VMs get all the resources they want, it ensures that all running VMs will be able to start (and if they have reserved resources receive those) in case of host failure (depending on the policy selected)

A better place to look is to click on the cluster > summary tab > HA widget. This will show you CPU and Memory reservations plus overhead. In my case my cluster has 3 hosts totalling 384GB of Memory with 64+ GB reserved.

2015-02-18_14-33-59.png

Does this answer your question?

0 Kudos
fsckit
Enthusiast
Enthusiast
Jump to solution

Thank you for the response. No, it does not answer my questions, but you have provided some insight into this issue.

I do not have a HA widget. Perhaps vSphere Client 5.1 for Windows does not have this feature.  I do see this:

vSphereHACapture.PNG

Based on this, and the lack of the alarm, we can assume that if I brought down one of the hosts in this 2-host cluster, all the VMs would be able to start on the single remaining host, correct?

And the alarm I refer to in this thread's title will only get triggered when that 98% goes to 50% or below, correct?

I am still concerned about that resource distribution chart for Memory, though. It looks like I could push it up to 100% on both hosts, and still not push memory failover capacity lower than 50%. I presume this is due to the fact that my VMs don't have reserved memory, so vSphere only counts the minimal amount of memory required to start the VM, and it will depend on swapping and ballooning if all the VMs actually start using all their memory.  

So the alarm I'm looking for is one that replicates the resource distribution chart, and alerts when the total unutilized memory in the cluster is less than the total memory of a single host.  Make sense?

0 Kudos
vbrowncoat
Expert
Expert
Jump to solution

>Based on this, and the lack of the alarm, we can assume that if I brought down one of the hosts in this 2-host cluster, all the VMs would be able to start on the single remaining host, correct?

Correct, if one of your hosts crashed or was reset all your VMs would have resources enough to start (no guarantee of performance, just starting/running)


>And the alarm I refer to in this thread's title will only get triggered when that 98% goes to 50% or below, correct?

There is no alarm. When your CPU or Memory failover capacity reaches 50% admission control will prevent you from powering on another VM


>I am still concerned about that resource distribution chart for Memory, though. It looks like I could push it up to 100% on both hosts, and still not push memory failover capacity lower than 50%. I presume this is due to the fact that my VMs don't have reserved memory, so vSphere only counts the minimal amount of memory required to start the VM, and it will depend on swapping and ballooning if all the VMs actually start using all their memory.

Correct


>So the alarm I'm looking for is one that replicates the resource distribution chart, and alerts when the total unutilized memory in the cluster is less than the total memory of a single host.  Make sense?

I'm not aware of an alarm like this. You may be able to find something through google or create something yourself. What about creating reservations for your VMs? Or at least the important ones? That way you can make sure they'll get the resources when they restart. Or if you really wanted you could change the admission control to the dedicated failover host policy? (with what I know of your situation I don't think I'd recommend this, but it would give you what you are asking for).


Have you looked at the vRAS fling? VM Resource and Availability Service – VMware Labs

This Fling enables you to perform a what-if analysis for host failures on your infrastructure. You can simulate failure of one or more hosts from a cluster (in vSphere) and identify how many:

  • VMs would be safely restarted on different hosts
  • VMs would fail to be restarted on different hosts
  • VMs would experience performance degradation after restarted on a different host

With this information, you can better plan the placement and configuration of your infrastructure to reduce downtime of your VMs/Services in case of host failures.


0 Kudos
fsckit
Enthusiast
Enthusiast
Jump to solution

>There is no alarm.

There is a default alarm called, "Insufficient vSphere HA failover resources".  I was wondering why this did not get triggered, based on the state of the Resource Distribution chart. I think you answered this.

I was not aware of these "flings", so thank you for pointing them out to me, though I would never be allowed to install one in this particular environment.

0 Kudos
vbrowncoat
Expert
Expert
Jump to solution

Here is a good post about that alarm: http://www.yellow-bricks.com/2012/12/04/insufficient-resources-to-satisfy-ha-failover-level-on-clust...

The fling isn't installed. You upload your DRS dump file to the website. Check out the readme file for more detail.

Does this answer your questions?

0 Kudos