VMware Communities > VMTN > VMware vCenter™ > VMware vCenter™ Server > Discussions
10 Replies Last post: Apr 28, 2008 11:06 AM by steven_catania
Reply

VC 2.5 HA bug?

Dec 18, 2007 3:24 PM

Click to view ian4563's profile Enthusiast ian4563 71 posts since
Jan 10, 2007
I just noticed that after upgrading to VC 2.5, on both of my clusters (one ESX 3.5, the other 3.02) in the summary screen under HA the "Current Failover Capacity" shows zero. Before the upgrade one cluster showed 4 and the other 2, so I know there's capacity. I have completely disabled then re-enabled HA on both clusters and still no luck. Anyone else seeing this?
Reply Re: VC 2.5 HA bug? Dec 18, 2007 3:56 PM
Click to view eziskind's profile Expert eziskind 309 posts since
Feb 21, 2006
VMware

A few questions for you:

Are there any HA errors on any of the hosts in the cluster?

Is DRS enabled on this cluster?

What is the maximum cpu and memory reservation among all powered on vms in the cluster?

Do you have both 1-cpu and 2-cpu vms in the cluster?

Reply Re: VC 2.5 HA bug? Dec 18, 2007 4:23 PM
Click to view PatruMau's profile Novice PatruMau 13 posts since
Dec 18, 2007
Same problem here.

I have a 4 hosts cluster (DL380 G5 with 16Gb of ram and 2 quadcore cpu). Ram is 40% free on any host...
Prior the VC upgrade, I was able to power on all my VMs without allow HA constraint violations, ... after the upgrade, to power on the same number of VMs (with the same resources) I must allow constraint violations, and "Current Failover Capacity" show 0

  • No HA error on any host
  • No CPU or RAM reservations
  • DRS is enabled
  • 1-cpu and 2-cpu VMs
Reply Re: VC 2.5 HA bug? Dec 18, 2007 5:59 PM
in response to: eziskind
Click to view ian4563's profile Enthusiast ian4563 71 posts since
Jan 10, 2007

Thanks for replying eziskind,

Are there any HA errors on any of the hosts in the cluster?

No, I have checked every node.

Is DRS enabled on this cluster?

Yes, on both clusters.

What is the maximum cpu and memory reservation among all powered on vms in the cluster?

Not exactly sure what you mean, but one cluster has 105GHz CPU and 132GB RAM in the summary screen, and there is only one resource pool that has a reservation: 10GHz and 20G memory. The other cluster has 49GHz and 90GB and again has only one resource pool with a reservation: 10GHz and 12GB memory. We do not have reservations set on a per VM basis.

Do you have both 1-cpu and 2-cpu vms in the cluster?

Yes, in both clusters.

Like I said before, the capacity was there pre-2.5. Unless the algorithm for determining capacity has changed it looks like a bug to me.

Reply Re: VC 2.5 HA bug? Dec 18, 2007 6:09 PM
Click to view Byron_Zhao's profile Enthusiast Byron_Zhao 124 posts since
Nov 7, 2006

Not sure how HA algorithem works, but if you set it to "allow constraint violations", HA will work regardless.

It happens in my 3.0.1 environment, but I did see it worked.

Reply Re: VC 2.5 HA bug? Dec 18, 2007 6:25 PM
in response to: ian4563
Click to view eziskind's profile Expert eziskind 309 posts since
Feb 21, 2006
VMware

The HA admission control algorithm has got somewhat more conservative in VC2.5 to cover some corner cases. One case where it can be overly conservative is where you have both 1-cpu and 2-cpu vms (in general, vms with mixed number of virtual cpus).

I can verify if this is the problem if you can get some extra logging:

Make sure HA admission control is enabled (to not allow constraint violations).

Enable verbose logging on the VirtualCenter server (Administration->"VirtualCenter Management Server Configuration..."->"Logging Options").

Try power on a vm (this should fail).

Check the vpxd.log file (C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs) for a line like this: "VpxdDas Slot info". Post the 5 lines that follow this one.

Reply Re: VC 2.5 HA bug? Dec 18, 2007 9:01 PM
in response to: eziskind
Click to view ian4563's profile Enthusiast ian4563 71 posts since
Jan 10, 2007
The 6 host cluster:

2007-12-18 22:34:54.299 'App' 860 info VpxdDas Das admission check failed. Configured failover: 2, Expected new failover: 0
2007-12-18 22:34:54.299 'App' 860 verbose VpxdDas Slot info:
2007-12-18 22:34:54.299 'App' 860 verbose VpxdDas Slot CPU=256, Slot numVcpus=4, Slot memory=457
2007-12-18 22:34:54.299 'App' 860 verbose VpxdDas Total slots=90, Total VMs=101
2007-12-18 22:34:54.299 'App' 860 verbose VpxdDas Total hosts=6, Total good hosts=6
2007-12-18 22:34:54.299 'App' 860 verbose VpxdDas Slots per host: 21 21 21 9 9 9
2007-12-18 22:34:54.299 'StackTracer' 860 info 860 Exit DAS_PROFILE CheckPowerOnVm (203 ms)

The 3 host cluster:

2007-12-18 22:35:02.205 'App' 828 verbose VpxdDas Slot info:
2007-12-18 22:35:02.205 'App' 828 verbose VpxdDas Slot CPU=256, Slot numVcpus=4, Slot memory=401
2007-12-18 22:35:02.205 'App' 828 verbose VpxdDas Total slots=44, Total VMs=40
2007-12-18 22:35:02.205 'App' 828 verbose VpxdDas Total hosts=3, Total good hosts=3
2007-12-18 22:35:02.205 'App' 828 verbose VpxdDas Slots per host: 17 17 10
2007-12-18 22:35:02.205 'App' 828 verbose VpxDrmRetrieveDomainConfigInfo: current activation is NULL, skipping privilege checking.

Reply Re: VC 2.5 HA bug? Dec 18, 2007 10:24 PM
in response to: ian4563
Click to view eziskind's profile Expert eziskind 309 posts since
Feb 21, 2006
VMware
Looks like you have some 4-cpu vms in the clusters too. That will really skew things. You're being hit by the combination of 2 new things in the HA admission control for VC 2.5:

1) If no reservation is set for a vm (or it is set to 0), use default of 256MHz, 256MB. (these values can be changed using HA advanced options: das.vmMemoryMinMB, das.vmCpuMinMHz)


2) For the cpu component of the slot, use (max MHz per virtual cpu) * (max number of vcpu's per vm)


The HA admission control algorithm is overly conservative in non-homogenous clusters, ie. ones with vms which have different reservations and/or vcpu number. #2 above makes it more conservative. Given these limitations, its best to try to keep the cluster as homogenous as possible. Is it possible to put the 4-cpu vms in a separate cluster? If not, you can try setting the default vm resources to 0 (using the advanced options in #1). This is how things worked in VC 2.0.

Reply Re: VC 2.5 HA bug? Dec 20, 2007 3:02 PM
in response to: eziskind
Click to view ian4563's profile Enthusiast ian4563 71 posts since
Jan 10, 2007
I changed all of our 4-vCPU VM's to 2-vCPU and now the failover capacity on both clusters shows 1. Thanks for letting us know about the HA changes and variables.
Reply Re: VC 2.5 HA bug? Jan 18, 2008 9:41 AM
in response to: eziskind
Click to view nsusa's profile Novice nsusa 26 posts since
Dec 2, 2005

Is there better documentation somewhere that explains this? The current documentation is a little weak in that regards and not everyone has the luxury available of having 'clean' clusters or to downgrade a machine.

Thanks.

CP

Reply Re: VC 2.5 HA bug? Apr 28, 2008 11:06 AM
in response to: nsusa
Click to view steven_catania's profile Enthusiast steven_catania 37 posts since
Oct 3, 2006

Gentlemen,

Thanks for all the info n this thread, we were experiencing the same issues. We had one VM with 4 CPU in a Farm of 1 and 2 CPU VMs. Once we moved the 4 CPU VM back to 2 CPU the Farm showed many servers avail for failover.

Steve

Actions