VMware Cloud Community
WinStime
Enthusiast
Enthusiast

VROPS 6.7 : Unable to do capacity planning anymore

Hi,

I made a big mistake : i didn't read the realease note about vrops 6.7. So i upgraded, and after few days, as everything works as it should, i removed the snapshoot.

But i didn't check very well about my capacity planning and the automation in VM placement.

Now, i'm in 6.7, this new version offers some good ehancement, but how to do the capacity planning ?

I used to have Cluster policy according 80% or 50% [DRP cluster] of the usable capacity. But now, the capacity usable is stuck to 100%.

The HA and buffer disapperead. Now, if i listen to VROPS, my clusters are well optmized, but i know that it's not the case. The acceptable limit for running VM is near, but vrops tell me , no problem.

I also use Turbonomic which tell me to add Hosts in cluster.

So how do you do ?

I need to provide a simple KPI to the management to monitore if investement are needed. Now, i'm unable to do it

8 Replies
daphnissov
Immortal
Immortal

This has been a common complaint and if you search the vROps subforum you'll see similar threads. All we can say at this point is to revert to 6.6.1 (which would mean a restoration since you already committed your snapshot), tell the vROps product management your feelings, or wait for the next update.

sunnydua2011101
VMware Employee
VMware Employee

Hi,

May be I can help you here.

The HA buffer in vRealize Operations 6.7 is not gone. It is auto-calculated and subtracted from Total Raw Capacity.

Hence Usable Capacity = Total Capacity - HA Buffers out of the box and you do not need to set any configurations or policies to consider this buffer.

With 6.7, HA Buffers are automatically read from your HA Admission Control Policy. Since HA buffers are defined for Reserving Capacity in case of HA failures, the source of truth for these buffers are in vCenter HA configuration. Hence if you need to reserve capacity of HA Failover, this should be configured in vCenter HA Admission Control Setting. Have Admission Control disabled or not configured and buffering HA capacity just in vROps is not a good practice since while you think you have enough capacity on a cluster from vROps point of View, your availability policy will not honor that.

Hence in 6.7 we made this configuration out of box and we honour your HA policies based on that

Please refer to out of box "Utilization Overview Dashboard" to get more details.

Hope this helps. If you have more questions or feedback on vRealize Operations 6.7, feel free to reach out to me on duas@vmware.com

Regards

Sunny

Regards Sunny
Reply
0 Kudos
JBaileyTX
Contributor
Contributor

Sunny,

Besides serendipity, what resource do you recommend to advise VROPS users to find things like your advice?  I am sure we would all welcome knowing where this exists outside of a few VMware employees.

Please excuse my short tone.  I had a car break down on a busy road and got badly sunburned waiting outside for police and tow truck...

Reply
0 Kudos
WinStime
Enthusiast
Enthusiast

Hi,

thanks both for your reply.

If i understand what you say, if i have admission control enabled on a cluster, then vrops should be able to detect the parameter and adjust the capacity usage.

But enabling admission control will also block any new start of VM if ressources are not available. So even if we are proactive with vrops, we can be block by admission control.

That's why we didn't enabled this option and we use to manage ressource pool to prioritize which kind of VM we want to prioritize (production vm in mutualized environnement)

With this new upgrade, i'm am unable to do capacity anymore. Before i had some few metrics with some home made configuration to tells me how many VM i deploy or time remaining.

Now, the only way to get an information, it the time remaining. OK... Why not. I can accept. You are the specialist, i'm only a simple user with a small experience.

But, how can you explain me that all my cluster are fine (1 yr remaining) and for some clusters i have CPU contention (7%) and vrops doesn't warm me. My clusters densification is too high and the CPU overcommit too (1:8.5)

I'm thinking about moving to Turbonomic which give me reliable data (in my case, add more host in some clusters which is the right way after some reclamation)

Reply
0 Kudos
sxnxr
Commander
Commander

I will try to answer some of this but bare in mind i am making some assumptions and could be wrong

But enabling admission control will also block any new start of VM if ressources are not available. So even if we are proactive with vrops, we can be block by admission control.

This is a problem with your admission control. I am amusing you have it disabled (dont worry it is the same for you and about 90% of the world) The point of that message is so HA has enough resources to power on the VMs if there is an HA event. Simply ignoring this and disabling admission control will only lead to problems later. Setting an appropriate admission control is a must in my mind. Setting memory reservations will effect HA the most by reducing the HA slots available.

But, how can you explain me that all my cluster are fine (1 yr remaining) and for some clusters i have CPU contention (7%) and vrops doesn't warm me. My clusters densification is too high and the CPU overcommit too (1:8.5)

You are mixing capacity models here. Over commit is allocation based capacity model. If you are using this i would expect Contention to be all over the map depending on the VM workloads or lack of

CPU contention is not used in the Allocation model but in the demand based model. If using a demand based model i would expect the allocation ratios to be all over the place as well depending on the workload of the VMs.

You could have low contention and still have 10 or 15 to on consolidation ratios for example i have 70:1 with 2% contention in one cluster and 10 to 1 with 30% contention due to the hardware we are running on and the power policy the servers are set to.

This is why we are moving to a performance based capacity model and away from both of these

Reply
0 Kudos
WinStime
Enthusiast
Enthusiast

Thanks for you reply.

As you say, 90% doesn't use admission control. For us, it's has no sens to use it because in case of needed (for example multiple host failure), we want to be able to run in degraded mode by using ressource pool. VM with high ressource pool (production vm) will be prioritize, and if necessary we will be able to power off test VM to enable power to Production VM.

We have several stretched clusters that are active active server on 2 datacenter. If a datacenter fails, HA will restart VM on the other site. So, to achieve this, we want to limit usable capacity to 50%. It was done under vrops by a dedicated policy. With that policy i was able to know how many VM/Time remains until 50%.

Now, the metrics is 100%, and the vm remaining value is not what i'm expected.

Reply
0 Kudos
chdrei
Enthusiast
Enthusiast

Hello,

for us this change is quite annoying, becaus we also have many stretched clusters

and don't use HA admission control.

As mentioned before, about 90% aren't using HA admission control, so why was

this changed? This change makes it harder for us to argument towards our CEO,

that we need vRealize Operations Manager in the future.

"I am not amused"

Chris

UberGeek1
Enthusiast
Enthusiast

So bypassing the argument about HA Admission Control, because I agree that when I loose a host it's fine, when I loos two hosts, then I would rather have swapping and ballooning than VMs hard down because they can't start.  When you have a cluster with 100+ hosts, having multiple host failures is a reality.

Anyway, if my understanding is correct, you don't need to have HA Admission Control actually enabled for vROps to read the settings, so changing the settings and leaving it disabled will allow vROps to use those numbers to perform it's calculations.  If you use capacity calculation based on actual available resources in the cluster, minus some buffer and your largest host, you probably want a PowerCLI script running nightly to automate this calculation and setting.  Of course, this is also based on my environment where a cluster of 100+ hosts will not always be the exact same, heck, most aren't even the same generation, but that's how it goes sometimes.

Either way, I've always found it arrogant that the typical answer from VMware is "You should be running HA Admission Control", and when I ask why, I basically get the "Because the cool kids do" type of answer, not an actual answer based in reality.  I do see some edge cases where it would be nice, but for most cases, good capacity planning and watching it is more key than trying to deploy a VM a customer requests only to find out you don't have capacity under HA Admission Control and now have to tell the user "Sorry, you're going to have to wait now".  I would rather do that up-front instead of after the fact.

Anyway, I relinquish the soap box...

Sincerely, Jody L. Whitlock
Reply
0 Kudos