Capacity Planning for vCloud Deployments

TheVMinator · ‎10-28-2013

I'd like to get some feedback on capacity planning specific to vCloud deployments, such as implementing VCAC and self provisioning.

When you converted your environment from physical servers to virtual servers, there are structured processes and math you can do to properly size your clusters.

For example, in posts like this:

http://vmfocus.com/2013/09/01/vsphere-sizing-formula-cpu-ram/

Using approaches that are outlined like this, you can baseline your applications that are currently running on physical servers. You can look at their performance history and know exactly what their demands are. You can do that for each application that is currently running on a physical server. Then you can total those numbers together and understand how to size your ESXi compute cluster. This approach works well. However it is based on certain assumptions particular to the scenario where what is now a physical environment is going to become virtual.

Some of these assumptions are:

The system you are capturing data on already exists on a physical server so that you can establish what its baseline performance needs are.
Because you can capture the data for all of your physical servers and combine them, you can use that total avg cpu and memory demand to help you decide how many physical servers you need in your ESXi compute clusters.
If you want to separate your clusters by putting VMs with high performance demand on a cluster with newer servers, and VMs which are test/dev or have lower performance demands on a cluster composed of older/cheaper servers, you can use the baselines you establish for what certain VMs will demand to help you decide how to break out your clusters. When you get done with your current state assessment for example, you may know that you need to run 50 of your VMs on high performance physical servers with faster CPUs, but 30 of your VMs don't demand a lot of CPUs and can best be combined on a cluster composed of older servers with slower CPUs that save money.

The goals of being able to plan ahead to know how many physical servers to buy, and how to break out your cluster, can be easily satisfied in this scenario.

Now suppose you are going to move to cloud and self provisioning. Now you have a different scenario:

Your environment is already virtualized and you provision VMs with vCenter Server. Up until now you have been getting requests far enough ahead to know what kind of VMs a business unit is going to request so that you can plan for the performance demands that application will have.

Now you are going to turn over the reigns to allow people to request VMs that will automatically get provisioned with VCAC. You want to provide Infrastructure as a Service. When they request the VM, it is going to get created in less than a few hours with an automated workflow. Then it will be sitting on a cluster, with the VM demanding whatever it may need it turns of CPU and RAM, ready or not. And perhaps not only one VM but many VMs at one time may be quickly provisioned.

To make matters more compliacted, you don't know the performance history of these VMs, because they never existed before. They might be the same as some current applications in your envioronment. But then again, the user may request a VM without telling you a lot about how they intend to use it. Or they may request it and go through an installation and testing process that takes 3 months, before they start using it at its full workload. For the first 3 months it looks like a low-demand VM. But then it spikes once it is fully utilized. You have a true "cloud abstraction layer" where you aren't requiring self-provisioning activities at the cloud layer to be managed individually at the virtualization layer. So you don't know what is going to be requested, how many VMs, what type of applications, and as a result, which clusters they should go on.

However, just as in the first scenario, you still need to do certain kinds of planning:

1. You need to decide how many physical servers to purchase and when, so that you have them ahead of time, even though you don't know exactly how many new VMs will be requested for the next budget year.

2. You need to decide the ratio of RAM to CPU power in those physical servers, even though you don't know what the ratio of the demand of the new applications is going to be in terms of average peak CPU and avg. peak RAM demands. Those VMs and applications don't exist yet in your environment to be able to baseline them.

3. You need to decide how you are going to break out your ESXi compute clusters - for example

- a prod cluster with more expensive and higher performance physical servers

- a dev cluster with slower servers

-other clusters such as a cluster that is designated as being in a DMZ (in this scenario vCloud Networking and Security can't be used due to customer requirements).

You need to group your physical servers into clusters, but your clusters are based on what kind of VMs will be requested, which you aren't going to know until the very last minute, and even after provisioning won't have as much visibility into what the VMs are doing as before.

Amidst all of these challenges you need to find a way to do capacity planning to know:

- How many servers to buy

- What the application demands will be in terms of CPU and memory

- How much RAM and CPU your servers will need and the proper ratio - so that you are not over or underprovisioned, causing you to waste money or run out of resources.

- How to plan for what clusters should be defined and how many servers in those clusters

You have tools like vCOPs that can report based on applications that are already in the environment. But you are planning for VMs that don't exist. You can use vCOps to do a "what if" scenario to see when a cluster runs out of resources. But you are attempting to plan for the purchase of ESXi servers for clusters that don't exist yet because it is a greenfield cloud deployment - so a "what if" scenario can 't be run.

The question is this:

take a traditional capacity planning methodology that targets known existing applications running on physical servers with historical performance informatoin readily available.
adapt it to have a capacity planning methodology that works in a greenfield vCloud deployment.
Make that methodology work even though there are variables such as what VMs will be requested or what performance a new VM will demand - variables that you either can't know or are much harder to guage or that you get little advance notice on.
Do all this while maintaining a true "cloud abstraction layer" - where every provisioning request does not have to be individual and manually funneled through a VM admin and analyzed.

Any links, feedback, experiences or references to a capacity planning approach that is designed toward this scenario?

Dr_Virt · ‎11-01-2013

It is a bit different then your proposal above. Part of the reason for the metrics and analysis above is the desire to drive your clusters at a high utilization rate (normally ~70%). When operating in a capacity management model, IT must focus on minimizing the anomalies and maintaining reserve capacity. Also, remember that provisioned VMs only consume what they consume. I may provision a "monster" VM, but its demand may be minimal.

First, you will notice that most IaaS providers limit the options around instance sizes. They provide templates around small, medium, large in order to minimize the number of one off configurations. Some smaller IaaS providers allow for greater freedom due to the lower resource management and rate of change demands. With these fixed units of consumption, analysis is performed to develop the trends identifying burn rate and consumption distribution. This forecast will inform internal IT of the expected resource exhaustion rate and which instance sizes the solution should be modeled after.

Another point is that in cloud environments, the utilization rate is significantly lower. It is not uncommon to see 50% compute and storage capacity in reserve. This allows for the perception of unlimited capacity. Going back to our analysis above, we can also develop the sweet spot for reserve capacity.

Specifically in the vCloud and vCAC space, there are a number of ways in which to deliver capacity and minimize risk. The first is approval policies whereby anomalous requests (super larger or extra quantity) can be suspended while awaiting approval. This works well for small - medium datacenters as the reserve capacity is much lower. We can configure the provisioning system to suspend and notify when any of a number of metrics are breached and execute approval workflows which may notify capacity management and planning.

Second, we may sandbox consumers. AWS's reserved instances and vCHS's default resource allocations work this way. Each tenant/consumer is provisioned a fixed amount of resources (with expansion), in which they may provision at will. This works well for tightly constrained environments (budgetary or resource) as resources can be replaced in cycle with allocations.

Third is VM resource constraints. All cloud providers exercise this control to provide a fixed unit of consumption per allocation. One may request a dual vCPU for VM, but each vCPU is granted a fixed 1 or 2 GHz of processing time. A consumer may load that VM to the fullest degree, but it will still only consume the processing cycles granted. vSphere, vCloud, and vCAC all make use of this functionality.

I would recommend reading "Cats vs. Cattle" to assist with your research into this space. As you migrate from virtual infrastructure to cloud service models, you will come to see the VMs as just another unit of "X". How they perform and in what ways they consume resources becomes less of a concern. The focus becomes request automation and detailed near and long term capacity management. Instead of where do I place this workload and its associated demands, you come to focus on how many slices of my infrastructure and how quickly are they consumed.

While cloud may be "by the drink", we package and deliver the drinks (12 oz, 16 oz, 20, 32, etc.).

TheVMinator · ‎11-01-2013

Thanks for the info. Do you have the link for "Cats vs Cattle"?

Third is VM resource constraints. All cloud providers exercise this control to provide a fixed unit of consumption per allocation. One may request a dual vCPU for VM, but each vCPU is granted a fixed 1 or 2 GHz of processing time. A consumer may load that VM to the fullest degree, but it will still only consume the processing cycles granted. vSphere, vCloud, and vCAC all make use of this functionality.

This is a helpful idea but I have a few questions about it.

If I'm going to do this in vSphere I need to assign a limit to my VM in terms of 2 GHz of CPU. There are some issues there:

-Can you assign a VM a limit like that in VCAC? I wasn't aware that VCAC had the ability to define that. You can create a limit on a VM in vCenter Server in the VM properties, or you can use compute clusters to assign limits to groups of VMs, but can you provision a VM in VCAC with a 2GHz limit on it out of the gate?

-If I could, would I really want vms to get self provisioned with CPU limits on them - as opposed to putting them in a compute cluster resource pool - is that what you were thinking?

How would this suggestion be implemented specifically with VCAC on top of a vSphere environment?