Dear All,
Downloading vCOPS (Standard)- Check
Importing OVF and configuring- Check
Integrating to the vCenter - Check
Browsing in IE through the VM's IP- Check
Admiring the green icons and the clour legends- Check
Degrading Memory size (for an instance) of a VM and saw it reflecting in vCops- Check
Trying to understand the numbers there- *Confused Face*
Going through Google and PPTs over n over- Check
Hence I thought I'll question my fellow-beings to find how to understand simply the Health, Capacity and Workload in numbers. I know it's a vague question, but trying to get your responses and questions that I could answer if u have any. How is the health of the VM defined? For business reasons, one host's VM CPU can be 2, while other's can be 4. How and where do I define these? How to arrive upon a DESIRED number for Health, Capacity & Workload?
Your two cents , please.
Thanks,
J
Hi, here is some info:
Health - answers the question "how is my system doing right now?" Health identifies the current problems in the system or issues that need to be resolved immediately to avoid problems. Thus health is the first high-level indicator you should look at, to see if your system needs immediate attention. Health is based on the three sub badges - Workload, Anomalies, Faults. If the health of a host drops from, for example 98% to 45%, you should go down one levev and check what is the problem - there may be some anomalies, some faults or high workload (insufficient Memory, CPU, disk or network bandwidth). From there you can discover some problem of your environment and fix them.
Workload - Workload measures how hard an object is working. Specifically it is defined as demand divided by effective capacity. As workload approaches (and exceeds) 100%, there is a high likelihood of performance problems.
Anomalies - Anomalies measures how abnormal the behavior of the object is, based on its stats data. Anomalies is the number of stats that are outside of their "normal", trended ranges, based on historical data. A high number of anomalies is usually an indication of a problem (or at least something the user will want to pay attention to). Anomalies and workload differ in that workload is computing an absolute measurement of how hard an object is working while anomalies is computing how different from normal it's working. Both are very useful when searching for and troubleshooting performance problems.
Faults - Faults measures the degree of faults or problems the object is experiencing by using specific knowledge of events and properties. Issues identified here would include loss of redundancy in NICs or HBAs, memory checksum errors, HA failover problems, etc. These are included in health since they require an immediate resolution, while items in risk may not be immediate (but nevertheless should eventually be fixed).
Hope this gives you better understanding of these badges. If you have more questions, feel free to ask here.
And some more for Risk:
Risk - Risk answers the question "are there future risks to my system?" Risk identifies potential future problems that could eventually hurt the performance of the system. Risk does not necessarily imply any current problems, only that there are issues the user needs to focus on some time soon (but not necessarily immediately). Risk breaks down into three sub-badges:
Time Remaining - Time Remaining measures how much time is remaining before each resource type of the object reaches its capacity (e.g. cpu usage or disk i/o). This gives the user how much time they have before they'll need to provision more physical or virtual resources or possibly do some load-balancing.
Capacity Remaining - Capacity Remaining measures how many more VMs can be placed on the object before the object reaches its capacity. Capacity remaining and time remaining are two sides of the same coin. However, they give slightly different perspectives.
Stress - Stress measures long-term or chronic workload. While workload shows an instantaneous value, stress looks over a longer period of time. Stress tells the user whether they have VMs or hosts that are under-sized or have too many VMs (in the case of hosts). Stress does not imply a current performance problem, but it will cause performance problems over time.