Solved: Re: Strategy and Planning for Resource Pools

TheVMinator · ‎09-20-2013

I need to prepare to implement resource pools on my ESXi 5.1 DRS clusters. We need to decide which virtual machines have priority over others. This is a senstive issue because it ties back to which business units applications are more important than other business units. In one sense, I know that a production SQL Server is more important than a Windows 7 Test VM. But when I have a cluster full of only SQL Servers, deciding which VMs get priority in contention means that I need:

techincal information:

what applications depend on these SQL Servers
How latency senstive are those applications
How do those applications respond to performance issue when SQL Server is slow or unavailable in contention and what is the impact

and I also need

business information:

If some SQL Servers are more critical to the business, which ones are they?
If all managers of all units all believe their SQL Servers are equally crtical, eventually a manager somewhere in the chain has to make a judgement call between other managers below him.

Once I know BOTH the technical and the business information, I have to take both into account when assigning VMs to resource pools and setting share values, reservations, limits, etc.

Then I need to justify the solution as well as possible and show my technical solution matches what managers have defined as goals.

Can someone advise as to the process of doing this-

Is there a guide somewhere to start this process
What are your experiences?
How to effectively connect the need to make technical configuration of vm priority/share values with business needs and connect the business goals to the technical solution?

Thanks for input

hstagner · ‎09-25-2013

I can offer some guidance for some of your points. So, let's break down what you need:

Technical

What applications depend on these SQL Servers?
- You could ask the application owners and try to map them out manually. This is prone to error and you may not get a complete picture.
- You could map the dependencies programatically. I would suggest taking a look at vCenter Infrastructure Navigator for this. VMware vCenter Infrastructure Navigator: Discover application services, visualize relationships and ...
- Ultimately, it may take a combination of both methods.
How latency sensitive are those applications?
- Again, you could ask the application owners to get a general "feel" for their latency sensitivity.
- Are these applications already virtualized or are you gathering the information so that you can virtualize them?
  - If they are already virtual, then you could model the VM performance with vCenter Operations Manager. VMware vCenter Operations Management Suite: Hybrid Cloud Computing - United States The advanced suite also comes with vCenter Infrastructure Navigator.
  - Now, just because you have a performance profile, does not mean that you necessarily know how "sensitive" the application is to latency. You want a profile to establish a performance baseline for the VM. You may have to test the application under different load and latency scenarios to see how the application itself responds under load.
How do those applications respond to performance issues when SQL Server is slow or unavailable in contention and what is the impact?
- I would offer the same advice as I did for discovering how the applications respond to latency. Benchmark baseline performance and then do load testing. Observe the application behavior.

Business:

If some SQL Servers are more critical to the business, which ones are they?
- It may not be the SQL servers you are after. Discover through interviewing which applications are the most critical. If those applications have a SQL Server backing them, you have your answer. Think about which applications help the business perform core functions and if the business could still operate (even in a degraded manner) without those applications. For example, a CRM system for sales may cripple the business if it were to go down, but the business may still be able to function without email (probably in a very degraded manner, but still function depending on the business. Just an example. Some businesses can absolutely not function without email - An email advertising business for example ).

Now I'll offer some general guidance. Just because an application is "Critical" does not necessarily mean it needs more performance. That is where your baseline gathering comes in. If an application is "Critical" there are generally two things that can impact the application.

Availability - The application is simply unavailable to the users. This one will get the business' attention quickly . vSphere has many mechanisms to handle this. The application may also have mechanisms to improve uptime. You'll have to investigate this.
Performance - The application is available, but a performance slowdown is impacting the users' ability to effectively use the application. This impacts productivity and in some cases (like the CRM example), sales. Going back to my previous example, users may complain about slow email, but it may not be a show stopper. A slow CRM may impact the number of sales that are closed in a given day.
Some "Critical" applications may not need much performance, so it may not make sense to put them in a resource pool meant for "high performance" applications. Maybe providing higher availability may be enough.
Some "Critical" applications may actually need a guaranteed level of performance. A resource pool may make sense for these applications.
Providing a level of availability for an application is almost always directly tied to a business decision.
Providing guaranteed performance for an application is driven by the business needs, but the decision of what performance to provide is based on technical facts and baselining, not the "criticality" of the particular application.

I hope this helps. Good luck on your project.

-----------------------------------------

Don't forget to mark this answer "correct" or "helpful" if you found it useful (you'll get points too).

Regards,

Harley Stagner

VCP3/4, VCAP-DCD4/5, VCDX3/4/5

Website: http://www.harleystagner.com

Twitter: hstagner

----------------------------------------- Don't forget to mark this answer "correct" or "helpful" if you found it useful (you'll get points too). Regards, Harley Stagner VCP3/4, VCAP-DCD4/5, VCDX3/4/5 Website: http://www.harleystagner.com Twitter: hstagner

View solution in original post

hstagner · ‎09-25-2013

I can offer some guidance for some of your points. So, let's break down what you need:

Technical

What applications depend on these SQL Servers?
- You could ask the application owners and try to map them out manually. This is prone to error and you may not get a complete picture.
- You could map the dependencies programatically. I would suggest taking a look at vCenter Infrastructure Navigator for this. VMware vCenter Infrastructure Navigator: Discover application services, visualize relationships and ...
- Ultimately, it may take a combination of both methods.
How latency sensitive are those applications?
- Again, you could ask the application owners to get a general "feel" for their latency sensitivity.
- Are these applications already virtualized or are you gathering the information so that you can virtualize them?
  - If they are already virtual, then you could model the VM performance with vCenter Operations Manager. VMware vCenter Operations Management Suite: Hybrid Cloud Computing - United States The advanced suite also comes with vCenter Infrastructure Navigator.
  - Now, just because you have a performance profile, does not mean that you necessarily know how "sensitive" the application is to latency. You want a profile to establish a performance baseline for the VM. You may have to test the application under different load and latency scenarios to see how the application itself responds under load.
How do those applications respond to performance issues when SQL Server is slow or unavailable in contention and what is the impact?
- I would offer the same advice as I did for discovering how the applications respond to latency. Benchmark baseline performance and then do load testing. Observe the application behavior.

Business:

If some SQL Servers are more critical to the business, which ones are they?
- It may not be the SQL servers you are after. Discover through interviewing which applications are the most critical. If those applications have a SQL Server backing them, you have your answer. Think about which applications help the business perform core functions and if the business could still operate (even in a degraded manner) without those applications. For example, a CRM system for sales may cripple the business if it were to go down, but the business may still be able to function without email (probably in a very degraded manner, but still function depending on the business. Just an example. Some businesses can absolutely not function without email - An email advertising business for example ).

Now I'll offer some general guidance. Just because an application is "Critical" does not necessarily mean it needs more performance. That is where your baseline gathering comes in. If an application is "Critical" there are generally two things that can impact the application.

Availability - The application is simply unavailable to the users. This one will get the business' attention quickly . vSphere has many mechanisms to handle this. The application may also have mechanisms to improve uptime. You'll have to investigate this.
Performance - The application is available, but a performance slowdown is impacting the users' ability to effectively use the application. This impacts productivity and in some cases (like the CRM example), sales. Going back to my previous example, users may complain about slow email, but it may not be a show stopper. A slow CRM may impact the number of sales that are closed in a given day.
Some "Critical" applications may not need much performance, so it may not make sense to put them in a resource pool meant for "high performance" applications. Maybe providing higher availability may be enough.
Some "Critical" applications may actually need a guaranteed level of performance. A resource pool may make sense for these applications.
Providing a level of availability for an application is almost always directly tied to a business decision.
Providing guaranteed performance for an application is driven by the business needs, but the decision of what performance to provide is based on technical facts and baselining, not the "criticality" of the particular application.

I hope this helps. Good luck on your project.

-----------------------------------------

Don't forget to mark this answer "correct" or "helpful" if you found it useful (you'll get points too).

Regards,

Harley Stagner

VCP3/4, VCAP-DCD4/5, VCDX3/4/5

Website: http://www.harleystagner.com

Twitter: hstagner

----------------------------------------- Don't forget to mark this answer "correct" or "helpful" if you found it useful (you'll get points too). Regards, Harley Stagner VCP3/4, VCAP-DCD4/5, VCDX3/4/5 Website: http://www.harleystagner.com Twitter: hstagner

TheVMinator · ‎09-27-2013

This is great info - thanks. I was wondering if you also had any suggestions about how I should go about tracking the results of my research?

So I really need to know a lot of information about each individual VM:

-Application that runs on it

-How important that application is to the business

-What other VMs are dependent on this application (i.e. a model-view-controller application design with 3 vms that are interdependent)

-How important that application is relative to other applications in the business if one has to have priority in contention

-It may be a critical application from a business standpoint, but is it latency senstive enough from a technical standpoint to demand a lot of shares or a guaranteed reservation?

-If I determine that it is critical to the business AND is latency sensitive and demands actual shares in vSphere to enforce its priority - do I need CPU shares, CPU reservations, Memory shares, Memory reservations, Disk/SIOC shares, etc. Is it memory, CPU and disk that I need to enforce or just one or two of those?

So after I find out all of this information, I've got to record it so 6 months later I can refer back to it or someone else can reference it. What is the best way to track this kind of information - for example: What software application should I use?

Should I create a "point system"? For example, I have a really important, semi-important, and not-so important set of VMs to the business. Using a scale of 1 to 10, 10 being most important, let management rank each software application to the business. Then taking that information, go and map all VMs that are dependent on these VMs. Then every VM in the environment has an initial "business rank" number.

Then I go and look at the technical side. I see how latency sensitive the applications are that were ranked "critical" from the business perspective. I create a technical importance score for each vm from 1 to 10.

Once I have the business criticality score and the latency sensitivity score, I combine them somehow to come up with a share value for each VM. I record each VMs score. Then I create resource pools that group VMs with scores that are similar.

Then I go back and account for VMs that need guaranteed performance and a reservation as opposed to just high shares.

I'm brainstorming - but I'm trying to come up with a systematic way to get to hard numbers such as resource pool share values out of raw business requirements and application benchmarks. Any other thoughts or enhancements to my plan or has anyone already systemitized this process?

All

Strategy and Planning for Resource Pools