VMware Cloud Community
TheVMinator
Expert
Expert
Jump to solution

Storage Capacity Planning with vCOPS (vCenter Operations Manager)

I would like to use vCOPs for two aspects of capacity planning:

-Based on past history, estimate a date when I will run out of storage space

-Based on past history, estimate a date when the capacity of my storage to provide the needed performance will be exhausted.  For example, when will IOPs or other significant performance factors run out

What are all the things I need to look at?

What besides a "what if" scenario do I need to look at?

Reply
0 Kudos
1 Solution

Accepted Solutions
mark_j
Virtuoso
Virtuoso
Jump to solution

Q: OK – but do you mean “disk i/O” usage per vmdk, per VM, or per datastore?


A: vCOps will track the IOPs demand/consumption on the datastore and VM resources.




Q:  Can you explain - What is a “SAN frame”?  not familiar with that term-


A:I was simply referring to your SAN solution / frame of DAEs / etc.





Q:

1.      What exactly is included in the supermetric, what functions are applied to what metrics,

2.      what are you applying the supermetric to?  All VMs that are contained in a datastore? 

and track my total IOPs.

What are the specific things you are attempting to measure:
Total IOPs per VM?

Total IOPs per datastore?

Which specific metric are you using to capture IOPS?  Commands per second? 

A: I usually grab commands per second and usage (kbps) for the Disk I/O. I'll usually sumN a group of VMs in scope for my analysis... if you want "everything" in your virtual environment I would apply this to all of your VMs, but if you want, let's say, everything on a group of datastores, apply it to the datastores.



Q:

Which metric are you attempting to get 95th percentile for?  Commands per second?

Are you meauring this for a time period such as the last 3 months? 

What is the reason you are attempting to use 95th percentile – is it because you are preparing for the “Peak” IOPS demand as the basis of what you need to supply on the storage array rather than just the normal IOPs demand?


A: 95th percentile for cmd/s or IOPS is pretty good for spec'ing your disk requirements. Sure, you can do a 95th percentile for the past 3 months, it's be better to have more history than less. 95th percentile is the most accurate for spec'ing disk requirements bsd on what I've seen. You don't want the absolute peak, but you don't want the average. You want to accommodate that 95th percentile, and when those peaks happen your cache can give you a little boost, you growth capacity will be utilized, or you might see a little bump in latency. Ether way, don't use your top peak since it's an uncommon burst and don't use your average since that situation will snowball quickly on you. Lots of documentation out there on that, so you don't have to take my word for it.


If you find this or any other answer useful please mark the answer as correct or helpful.

View solution in original post

Reply
0 Kudos
6 Replies
jddias
VMware Employee
VMware Employee
Jump to solution

Hey VMinator (I like that name, is your vCenter named "Sara Conner"?) Smiley Happy

  Assuming that you are running fairly static (i.e. normal "run rate" VM growth) then you don't really need to use the "What If" tool to understand when you will hit a capacity or performance limit.  That information is easily viewable from the main vC Ops vSphere UI dashboard under the Risk > Time Remaining badge.  For more detail you can look at the various views under planning related to the Capacity

  You would use the "What If" scenario to determine

  - What will be the impact of change (adding, removing VMs, hosts or datastores) have on my capacity?

  - If I am able to reclaim resources (storage, CPU or memory) what impact will that have on my capacity?

For example, you may find that a datastore has 90 days of capacity remaining (based on historic usage trending).  That's good to know, but you may also find that under Efficiency > Reclaimable Waste you have potentially 500GB (for example) that is tied up in powered off or idle VMs.  You could then run a What If to see how removing those VMs would impact the time remaining and give you more time before you need to take action (buy more storage).

Does that help?

Visit my blog for vCloud Management tips and tricks - http://www.storagegumbo.com
mark_j
Virtuoso
Virtuoso
Jump to solution

I recommend running vCOps as long as possible to do any type of forecasting based on historical trends. Running it for a week isn't going to be able to forecast based on 1-week's worth of activities. We typically say that vCOps needs to run at least 4 weeks before you starting get any 'really good' planning information. If you don't have that historical data for trending, you're going be fairly limited on how far ahead you can look [accurately].

Like @jddias mentioned, the what-if is only going to really help you if you want to see the effects of a specific config change or group of config changes. If you want to see what your capacity looks like based on your "run rate", then use the OOTB planning views.

When you're asking about 'performance', what are we talking about really? Latency? IOPs consumed vs available? The fact is, vCOps by itself cannot see how your physical and logical storage system is configured and deduce what the theoretical IOPs maximum is per LUN for each VM [not today]. However, vCOps OOTB will keep track of Disk I/O (usage). It will forecast this along with all other key planning metrics such as CPU, Memory, Disk Space, etc. If you want to watch this in your planning views and What-If scenarios, this may help you compare what you're using actually/theoretically vs what you know your storage system is capable of delivering.

Personally when it comes to Disk I/O, I take all of my Datastores/VMs for a SAN frame, create a supermetric and apply to all of them (group, etc), and track my total IOPs. I watch that #, and the 95th percentile of that #. If you know your stuff, you'll know what you SAN was spec'd to handle or how much load it is currently configured to handle. We can alert of this via the supermetric (DT or KPI_HT), but typically this type of load doesn't just appear overnight and you'll see it coming based on the DT. Picture this scene from "Tommy Boy"... driving along the road with your family (your VM environment consuming IOPS)... and a truck tire (you're SAN's max capacity):

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCwQtwIwAA&url=http%3A%2...

As that clip depicts, you want to stop before you hit that truck tire. vCOps can help you prevent that from happening through the planning views, intelligent analytics, alerts and dashboards.

However, let's say you to miss the break pedal, hit the truck tire and burst in to flame [these things happen]. Rest assured, vCOps can help you with that too. If you're seeing high latency and contention, odds are you're already starting to saturate your storage or possibly have a misconfiguration - a situation vCOps can help you identifying from a consumption perspective. Do you want to identify 'what' is consuming your Disk I/O and who is suffering most? They've got that OOTB with the Views and Analysis tabs. And if that's not good enough, open up the Custom UI and create a snazzy dashboard with drag-n-drop widgets calculating the Top-N consumers and victims of your situation.

Hope this helps on your road.

If you find this or any other answer useful please mark the answer as correct or helpful.
jddias
VMware Employee
VMware Employee
Jump to solution

mark.j wrote:

I recommend running vCOps as long as possible to do any type of forecasting based on historical trends. Running it for a week isn't going to be able to forecast based on 1-week's worth of activities. We typically say that vCOps needs to run at least 4 weeks before you starting get any 'really good' planning information. If you don't have that historical data for trending, you're going be fairly limited on how far ahead you can look [accurately].

Very good point - best practice for any capacity analysis is a minimum of a 4 week data bucket (not just for vC Ops, but generally speaking).

Visit my blog for vCloud Management tips and tricks - http://www.storagegumbo.com
Reply
0 Kudos
TheVMinator
Expert
Expert
Jump to solution



   

Hey VMinator (I like that name, is your vCenter named "Sara Conner"?)

  Assuming that you are running fairly static (i.e. normal "run rate" VM growth) then you don't really need to use the "What If" tool to understand when you will hit a capacity or performance limit.  That information is easily viewable from the main vC Ops vSphere UI dashboard under the Risk > Time Remaining badge.  For more detail you can look at the various views under planning related to the Capacity

  You would use the "What If" scenario to determine

  - What will be the impact of change (adding, removing VMs, hosts or datastores) have on my capacity?

  - If I am able to reclaim resources (storage, CPU or memory) what impact will that have on my capacity?

For example, you may find that a datastore has 90 days of capacity remaining (based on historic usage trending).  That's good to know, but you may also find that under Efficiency > Reclaimable Waste you have potentially 500GB (for example) that is tied up in powered off or idle VMs.  You could then run a What If to see how removing those VMs would impact the time remaining and give you more time before you need to take action (buy more storage).

Does that help?

Jddias:

Thanks for your reply.   That does help.  (I don’t actually have a vCenter by the name Sara Conner).

We need to estimate the impact of adding a large number of VMs next year.  Is the “What if” scenario a better option that anything in the custom UI?

Like @jddias mentioned, the what-if is only going to really help you if you want to see the effects of a specific config change or group of config changes. If you want to see what your capacity looks like based on your "run rate", then use the OOTB planning views.

I think in this case we are focusing on forecasting based on future config change of adding a large number of VMs and estimating impact.

When you're asking about 'performance', what are we talking about really? Latency? IOPs consumed vs available? The fact is, vCOps by itself cannot see how your physical and logical storage system is configured and deduce what the theoretical IOPs maximum is per LUN for each VM [not today].

Based on information from the storage team I already know max IOPs per LUN.  In this case we want to see IOPs consumed vs available, as opposed to latency.  (We aren’t concerned with latency at least in the context of this particular forecasting exercise).  I would like to get the available IOPs info though directly from the NetApp / EMC adapters though.  Shouldn’t I be able to pull that in in the Custom UI and be able to compare VM demand on a datastore from max available IOPs per LUN? It would be nice if I could pull that from the storage adapters in the Custom UI and create a supermetric for it or something.  Perhaps that is a separate post/discussion.

However, vCOps OOTB will keep track of Disk I/O (usage).

OK – but do you mean “disk i/O” usage per vmdk, per VM, or per datastore?

It will forecast this along with all other key planning metrics such as CPU, Memory, Disk Space, etc.

If you want to watch this in your planning views and What-If scenarios, this may help you compare what you're using actually/theoretically vs what you know your storage system is capable of delivering.

The problem here is I know what my storage system is capable of delivering per LUN

Personally when it comes to Disk I/O, I take all of my Datastores/VMs for a SAN frame,

Can you explain - What is a “SAN frame”?  not familiar with that term-

create a supermetric and apply to all of them (group, etc),

I got lost here – you are creating a supermetric:

1.      What exactly is included in the supermetric, what functions are applied to what metrics,

2.      what are you applying the supermetric to?  All VMs that are contained in a datastore? 

and track my total IOPs.

What are the specific things you are attempting to measure:
Total IOPs per VM?

Total IOPs per datastore?

Which specific metric are you using to capture IOPS?  Commands per second? 

I watch that #, and the 95th percentile of that #.

Which metric are you attempting to get 95th percentile for?  Commands per second?

Are you meauring this for a time period such as the last 3 months? 

What is the reason you are attempting to use 95th percentile – is it because you are preparing for the “Peak” IOPS demand as the basis of what you need to supply on the storage array rather than just the normal IOPs demand?

If you know your stuff, you'll know what you SAN was spec'd to handle or how much load it is currently configured to handle.

In this case we do know what we can handle from an IOPs perspective based on storage team info

We do have that number as a starting point – we know how many IOPs each LUN can supply maximum.

We can alert of this via the supermetric (DT or KPI_HT), but typically this type of load doesn't just appear overnight and you'll see it coming based on the DT.

Picture this scene from "Tommy Boy"... driving along the road with your family (your VM environment consuming IOPS)... and a truck tire (you're SAN's max capacity):

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCwQtwIwAA&url=http%3A%2...

As that clip depicts, you want to stop before you hit that truck tire. vCOps can help you prevent that from happening through the planning views, intelligent analytics, alerts and dashboards.

However, let's say you to miss the break pedal, hit the truck tire and burst in to flame [these things happen]. Rest assured, vCOps can help you with that too. If you're seeing high latency and contention, odds are you're already starting to saturate your storage or possibly have a misconfiguration - a situation vCOps can help you identifying from a consumption perspective. Do you want to identify 'what' is consuming your Disk I/O and who is suffering most? They've got that OOTB with the Views and Analysis tabs. And if that's not good enough, open up the Custom UI and create a snazzy dashboard with drag-n-drop widgets calculating the Top-N consumers and victims of your situation.

Agreed we don’t want to hit the truck tire as the video warns.  Right now the consideration is entirely forecasting based.  There are monitoring-based things we need to do in vCOps, (that is being handled outside the scope of this particular exercise) - but for the purposes of this exercise I’m just doing some forward looking to see what happens to storage resources when I add a large number of VMs at a future time.

Hope this helps on your road.

Yes it does thanks!

Reply
0 Kudos
mark_j
Virtuoso
Virtuoso
Jump to solution

Q: OK – but do you mean “disk i/O” usage per vmdk, per VM, or per datastore?


A: vCOps will track the IOPs demand/consumption on the datastore and VM resources.




Q:  Can you explain - What is a “SAN frame”?  not familiar with that term-


A:I was simply referring to your SAN solution / frame of DAEs / etc.





Q:

1.      What exactly is included in the supermetric, what functions are applied to what metrics,

2.      what are you applying the supermetric to?  All VMs that are contained in a datastore? 

and track my total IOPs.

What are the specific things you are attempting to measure:
Total IOPs per VM?

Total IOPs per datastore?

Which specific metric are you using to capture IOPS?  Commands per second? 

A: I usually grab commands per second and usage (kbps) for the Disk I/O. I'll usually sumN a group of VMs in scope for my analysis... if you want "everything" in your virtual environment I would apply this to all of your VMs, but if you want, let's say, everything on a group of datastores, apply it to the datastores.



Q:

Which metric are you attempting to get 95th percentile for?  Commands per second?

Are you meauring this for a time period such as the last 3 months? 

What is the reason you are attempting to use 95th percentile – is it because you are preparing for the “Peak” IOPS demand as the basis of what you need to supply on the storage array rather than just the normal IOPs demand?


A: 95th percentile for cmd/s or IOPS is pretty good for spec'ing your disk requirements. Sure, you can do a 95th percentile for the past 3 months, it's be better to have more history than less. 95th percentile is the most accurate for spec'ing disk requirements bsd on what I've seen. You don't want the absolute peak, but you don't want the average. You want to accommodate that 95th percentile, and when those peaks happen your cache can give you a little boost, you growth capacity will be utilized, or you might see a little bump in latency. Ether way, don't use your top peak since it's an uncommon burst and don't use your average since that situation will snowball quickly on you. Lots of documentation out there on that, so you don't have to take my word for it.


If you find this or any other answer useful please mark the answer as correct or helpful.
Reply
0 Kudos
TheVMinator
Expert
Expert
Jump to solution

OK thanks

Reply
0 Kudos