VMware Cloud Community
VirtuvianMan
Contributor
Contributor

How are Health, Capacity & Workload calculated?

Dear All,

Downloading vCOPS (Standard)- Check

Importing OVF and configuring- Check

Integrating to the vCenter - Check

Browsing in IE through the VM's IP- Check

Admiring the green icons and the clour legends- Check

Degrading Memory size (for an instance) of a VM and saw it reflecting in vCops- Check

Trying to understand the numbers there- *Confused Face*

Going through Google and PPTs over n over- Check

Hence I thought I'll question my fellow-beings to find how to understand simply the Health, Capacity and Workload in numbers. I know it's a vague question, but trying to get your responses and questions that I could answer if u have any. How is the health of the VM defined? For business reasons, one host's VM CPU can be 2, while other's can be 4. How and where do I define these? How to arrive upon a DESIRED number for Health, Capacity & Workload?

Your two cents , please.

Thanks,

J

Reply
0 Kudos
2 Replies
admin
Immortal
Immortal

Hi, here is some info:

Health - answers the question "how is my system doing right now?" Health  identifies the current problems in the system or issues that need to be  resolved immediately to avoid problems. Thus health is the first  high-level indicator you should look at, to see if your system needs  immediate attention. Health is based on the three sub badges - Workload, Anomalies, Faults. If the health of a host drops from, for example 98% to 45%, you should go down one levev and check what is the problem - there may be some anomalies, some faults or high workload (insufficient Memory, CPU, disk or network bandwidth). From there you can discover some problem of your environment and fix them.

Workload - Workload measures how hard an object is working. Specifically it is  defined as demand divided by effective capacity. As workload approaches  (and exceeds) 100%, there is a high likelihood of performance problems.

Anomalies - Anomalies measures how abnormal the behavior of the object is, based on  its stats data. Anomalies is the number of stats that are outside of  their "normal", trended ranges, based on historical data. A high number  of anomalies is usually an indication of a problem (or at least  something the user will want to pay attention to). Anomalies and  workload differ in that workload is computing an absolute measurement of  how hard an object is working while anomalies is computing how  different from normal it's working. Both are very useful when searching  for and troubleshooting performance problems.

Faults - Faults measures the degree of faults or problems the object is  experiencing by using specific knowledge of events and properties.   Issues identified here would include loss of redundancy in NICs or HBAs,  memory checksum errors, HA failover problems, etc.  These  are included in health since they require an immediate resolution, while  items in risk may not be immediate (but nevertheless should eventually  be fixed).

Hope this gives you better understanding of these badges. If you have more questions, feel free to ask here.

Reply
0 Kudos
admin
Immortal
Immortal

And some more for Risk:

Risk - Risk answers the question "are there future  risks to my system?" Risk  identifies potential future problems that  could eventually hurt the  performance of the system. Risk does not  necessarily imply any current  problems, only that there are issues the  user needs to focus on some  time soon (but not necessarily  immediately). Risk breaks down into three  sub-badges:

Time Remaining -  Time Remaining measures how much time is remaining before each resource   type of the object reaches its capacity (e.g. cpu usage or disk i/o).   This gives the user how much time they have before they'll need to   provision more physical or virtual resources or possibly do some   load-balancing.

Capacity Remaining - Capacity Remaining measures how many more VMs can be placed on the  object before the object reaches its capacity. Capacity remaining and  time remaining are two sides of the same coin. However, they give  slightly different perspectives.

Stress - Stress measures long-term or chronic workload. While workload shows an  instantaneous value, stress looks over a longer period of time. Stress  tells the user whether they have VMs or hosts that are under-sized or  have too many VMs (in the case of hosts). Stress does not imply a  current performance problem, but it will cause performance problems over  time.

Reply
0 Kudos