Half a year ago, we have begun to use vRealize and I would like to point out in this post, what really stressed me out during usage.
First I want to thank VMware to finally bring the GUI to HTML5, as it at least improves the UI. Unfortunately, the underlining product persists to be an overcomplicated and unhandy piece of software. Let me explain what leads me to this statement and where that disappointment comes from.
1. Metrics over metrics and no one knows what they mean
I like metrics, and yes I like to have as much as possible to get to know what goes on in my environment. However, I only like them, if I know what they mean, or at least, if I can somehow find out what they mean by consulting the documentation. Unfortunately, VMware seems not being able to deliver that information in a quick and understandable fashion. There are tons of metrics that are calculated by vRealize itself. The output is often a percentage of “something”. It is intransparent how that percentage is calculated and what it actually means for my environment.
Let us for example take the metric “CPU Workload %”. I was notified by vRealize about the following alert, which was triggered by a “high CPU Workload %” symptom on the cluster:
“Fully-automated DRS-enabled cluster has high CPU workload”
One could think it must be the percentage to which a resource’s CPU is used. However, what’s the “CPU % Used” metric then? The “cpu % used” as well as the “cpu %demand” are both way under the “cpu % workload” value. What does that mean? Ok may I just don’t get the concept and I should head to the documentation. As this Metric is applied to the cluster, it for sure should be stated in the “Cluster Compute Resource Metrics” section:
However, no CPU Workload listened there. Hmm… additionally they refer in the documentation to the metric key (which is not visible through the guy at all). It may could be one of these workload metrics listened there, dunno, and even if I would find the correct one, the often provided one-liner explanation would most certainly not help me out anyway. As a last resort I go to the communities forum and the explanation I get is “Workload is a vROps synthetic value that is derived using multiple raw metrics that, when combined together, gives you a true indicator of how busy/utilized a resource is.”
Nice, now we have it. “CPU %Workload” is a vRealize calculated metric that somehow, should tell me something. I neither know what caused this high value nor do I know what I can do to decrease the value or if an action is required at all.
I often not go this far to get to know what a metric means, as it is just to overcomplicated and in the end I mostly end up do not getting smarter at all.
A similar issue I face is, when we talk about the “Analysis” Tab, which is available for each object. Most of the underlining “dashboards” are pretty hard to get. As there are videos linked, that should explain those, I think VMware is aware of the difficulty user’s face when using them. Unfortunately, even after watching those video, I often only have an assumption what exactly they should tell me, how they are calculated or what I have to do, to improve this situation, if anything is really necessary.
2. Reporting is pretty basic/useless
Sure, you can do reporting as with every good monitoring tool. However, even though the process of creating reports is simple, the output is most of the times unusable. The views and charts do scale portly inside the generated pdf and you end up having a document with charts and tables going over multiple sites. Additionally most of the times one is facing reports with views, which are empty at all because somehow the object which one selected is not supported for that kind of data. I am sure this does make sense somehow, but letting the user experiment what does work and what not, is just a frustrating processes. I ended up getting familiar with the API and script the report by myself. However, even this overcomplicated. As we need the “metric key” for gaining access to the metrics value we have to first get this one. Unfortunately, metric keys are documented for the vSphere part only, not for the Horizon Adapter. The only way I was able to obtain them was by exporting a dashboard which included those metrics and look into the generated xml. Once again, not the usability I expected.
3. Customizing Alerts
There are tons of alerts in vRealize and for sure you need to customize them to fit your environment. The process to do so is once again a complete disaster in terms of usability. Because alerts (and maybe symptoms too?) are resetted when performing an upgrade you have to duplicate all alerts (and maybe symptoms too?) first. Afterwards you can change what you’d like and in the end you have to disable the original Alert in the policy.
Ok, it’s not this much, but if you have to do this with every alert it really starts getting you frustrated, because you can’t simple go to alert and change for example the cpu usage percentage to 90% instead of 80%.
4. All the rest
Here some other issues that vRealize faces, as I not have time to get into detail in all of them, here a short list:
- Bad integration of stateless automated View Pools
- Complicated process for disabling Alerts on specific Objects
- No possibility to disable certain alerts during maintenance windows (for example backup times)
- Finding the right objects when configuring widgets can get frustrating, as the navigation tree logic is abstruse.
Overall I would most probably not buy the product again. It’s inability to deliver knowledge of how to interpret and use the product is disappointing. Intuitively and easy handling is missing and you often end up just disabling alerts or metrics as you don’t know what they should represent anyway.
The onliest reason for us to use it, is that at the time of purchase lots of horizon metrics could not be extracted directly through the api, what forced us to buy a product capable to monitor it. As VMware recently changed its API, one can do it now without buying an over engineered, over complicated product.
This is my personal thought. May I am just too dumb or lazy to understand this product, so just give it a try, if you wish to. For me the product would need fundamental refactoring or even a complete rebuild to get to something I can recommend to others.