VMware Cloud Community
MartinE11
Enthusiast
Enthusiast

Why vROps is probably not the product you are looking for

Half a year ago, we have begun to use vRealize and I would like to point out in this post, what really stressed me out during usage.

First I want to thank VMware to finally bring the GUI to HTML5, as it at least improves the UI. Unfortunately, the underlining product persists to be an overcomplicated and unhandy piece of software. Let me explain what leads me to this statement and where that disappointment comes from.

1. Metrics over metrics and no one knows what they mean

I like metrics, and yes I like to have as much as possible to get to know what goes on in my environment. However, I only like them, if I know what they mean, or at least, if I can somehow find out what they mean by consulting the documentation. Unfortunately, VMware seems not being able to deliver that information in a quick and understandable fashion. There are tons of metrics that are calculated by vRealize itself. The output is often a percentage of “something”. It is intransparent how that percentage is calculated and what it actually means for my environment.

Let us for example take the metric “CPU Workload %”. I was notified by vRealize about the following alert, which was triggered by a “high CPU Workload %” symptom on the cluster:

“Fully-automated DRS-enabled cluster has high CPU workload”

One could think it must be the percentage to which a resource’s CPU is used. However, what’s the “CPU % Used” metric then? The “cpu % used” as well as the “cpu %demand” are both way under the “cpu % workload” value. What does that mean? Ok may I just don’t get the concept and I should head to the documentation. As this Metric is applied to the cluster, it for sure should be stated in the “Cluster Compute Resource Metrics” section:

http://pubs.vmware.com/vrealizeoperationsmanager-66/index.jsp#com.vmware.vcom.core.doc/GUID-F6638548...

However, no CPU Workload listened there. Hmm… additionally they refer in the documentation to the metric key (which is not visible through the guy at all). It may could be one of these workload metrics listened there, dunno, and even if I would find the correct one, the often provided one-liner explanation would most certainly not help me out anyway. As a last resort I go to the communities forum and the explanation I get is “Workload is a vROps synthetic value that is derived using multiple raw metrics that, when combined together, gives you a true indicator of how busy/utilized a resource is.

Nice, now we have it. “CPU %Workload” is a vRealize calculated metric that somehow, should tell me something. I neither know what caused this high value nor do I know what I can do to decrease the value or if an action is required at all.   

I often not go this far to get to know what a metric means, as it is just to overcomplicated and in the end I mostly end up do not getting smarter at all.

A similar issue I face is, when we talk about the “Analysis” Tab, which is available for each object. Most of the underlining “dashboards” are pretty hard to get. As there are videos linked, that should explain those, I think VMware is aware of the difficulty user’s face when using them. Unfortunately, even after watching those video, I often only have an assumption what exactly they should tell me, how they are calculated or what I have to do, to improve this situation, if anything is really necessary.

2. Reporting is pretty basic/useless

Sure, you can do reporting as with every good monitoring tool. However, even though the process of creating reports is simple, the output is most of the times unusable. The views and charts do scale portly inside the generated pdf and you end up having a document with charts and tables going over multiple sites. Additionally most of the times one is facing reports with views, which are empty at all because somehow the object which one selected is not supported for that kind of data. I am sure this does make sense somehow, but letting the user experiment what does work and what not, is just a frustrating processes. I ended up getting familiar with the API and script the report by myself. However, even this overcomplicated. As we need the “metric key” for gaining access to the metrics value we have to first get this one. Unfortunately, metric keys are documented for the vSphere part only, not for the Horizon Adapter. The only way I was able to obtain them was by exporting a dashboard which included those metrics and look into the generated xml. Once again, not the usability I expected.

3. Customizing Alerts

There are tons of alerts in vRealize and for sure you need to customize them to fit your environment. The process to do so is once again a complete disaster in terms of usability. Because alerts (and maybe symptoms too?) are resetted when performing an upgrade you have to duplicate all alerts (and maybe symptoms too?) first. Afterwards you can change what you’d like and in the end you have to disable the original Alert in the policy.

Ok, it’s not this much, but if you have to do this with every alert it really starts getting you frustrated, because you can’t simple go to alert and change for example the cpu usage percentage to 90% instead of 80%.

4. All the rest

Here some other issues that vRealize faces, as I not have time to get into detail in all of them, here a short list:

  1. Bad integration of stateless automated View Pools
  2. Complicated process for disabling Alerts on specific Objects
  3. No possibility to disable certain alerts during maintenance windows (for example backup times)
  4. Finding the right objects when configuring widgets can get frustrating, as the navigation tree logic is abstruse.

Overall I would most probably not buy the product again. It’s inability to deliver knowledge of how to interpret and use the product is disappointing. Intuitively and easy handling is missing and you often end up just disabling alerts or metrics as you don’t know what they should represent anyway.

The onliest reason for us to use it, is that at the time of purchase lots of horizon metrics could not be extracted directly through the api, what forced us to buy a product capable to monitor it. As VMware recently changed its API, one can do it now without buying an over engineered, over complicated product.

This is my personal thought. May I am just too dumb or lazy to understand this product, so just give it a try, if you wish to. For me the product would need fundamental refactoring or even a complete rebuild to get to something I can recommend to others.

8 Replies
MichaelRyom
Hot Shot
Hot Shot

Im not sure if you are just trolling or want help.

But like any products improvements are wanted. Hell I still dont understand why the network list isnt sorted, when selecting a network for a vm in vsphere, that just how it is...

But you also need to know and understand the product. So take a course or buy PSO to help you. Sounds like you need it  

Blogging at https://MichaelRyom.dk
MartinE11
Enthusiast
Enthusiast

I would not call it trolling. It is just a feedback, my personal opinion of the product. I was curious if there were others, facing the same issues when working with the product.

Reply
0 Kudos
sxnxr
Commander
Commander

You may not have ment it to be a troll but it does look like one. Would you walk into the front door of a restaurant and say " Dont eat here it may not be for you because of x y & z. But i will try and answer some fo your concerns

Half a year ago, we have begun to use vRealize and I would like to point out in this post, what really stressed me out during usage.

First I want to thank VMware to finally bring the GUI to HTML5, as it at least improves the UI. Unfortunately, the underlining product persists to be an overcomplicated and unhandy piece of software. Let me explain what leads me to this statement and where that disappointment comes from.

1. Metrics over metrics and no one knows what they mean

I like metrics, and yes I like to have as much as possible to get to know what goes on in my environment. However, I only like them, if I know what they mean, or at least, if I can somehow find out what they mean by consulting the documentation. Unfortunately, VMware seems not being able to deliver that information in a quick and understandable fashion. There are tons of metrics that are calculated by vRealize itself. The output is often a percentage of “something”. It is intransparent how that percentage is calculated and what it actually means for my environment.

Let us for example take the metric “CPU Workload %”. I was notified by vRealize about the following alert, which was triggered by a “high CPU Workload %” symptom on the cluster:

“Fully-automated DRS-enabled cluster has high CPU workload”

One could think it must be the percentage to which a resource’s CPU is used. However, what’s the “CPU % Used” metric then? The “cpu % used” as well as the “cpu %demand” are both way under the “cpu % workload” value. What does that mean? Ok may I just don’t get the concept and I should head to the documentation. As this Metric is applied to the cluster, it for sure should be stated in the “Cluster Compute Resource Metrics” section:

http://pubs.vmware.com/vrealizeoperationsmanager-66/index.jsp#com.vmware.vcom.core.doc/GUID-F6638548...

However, no CPU Workload listened there. Hmm… additionally they refer in the documentation to the metric key (which is not visible through the guy at all). It may could be one of these workload metrics listened there, dunno, and even if I would find the correct one, the often provided one-liner explanation would most certainly not help me out anyway. As a last resort I go to the communities forum and the explanation I get is “Workload is a vROps synthetic value that is derived using multiple raw metrics that, when combined together, gives you a true indicator of how busy/utilized a resource is.

Nice, now we have it. “CPU %Workload” is a vRealize calculated metric that somehow, should tell me something. I neither know what caused this high value nor do I know what I can do to decrease the value or if an action is required at all.  

Workload is calculated by taking the demand of a resource and dividing it by the effective capacity of that resource. If workload is just below, at, or above 100 percent, then the object has a high likelihood of having performance problems.

Workload can be greater than 100 percent, as demand can be greater than 100 percent of its currently assigned capacity. Host, virtual machine, and datastore objects will have their workload calculated directly from their own raw metrics, while cluster and data center objects will be derived from an average of all the children that are both vSphere hosts and virtual machines.

I often not go this far to get to know what a metric means, as it is just to overcomplicated and in the end I mostly end up do not getting smarter at all.

A similar issue I face is, when we talk about the “Analysis” Tab, which is available for each object. Most of the underlining “dashboards” are pretty hard to get. As there are videos linked, that should explain those, I think VMware is aware of the difficulty user’s face when using them. Unfortunately, even after watching those video, I often only have an assumption what exactly they should tell me, how they are calculated or what I have to do, to improve this situation, if anything is really necessary.

To be honest if you dont understand the base matric's and what they mean you will never know what the Analysis dashboard means

2. Reporting is pretty basic/useless

Sure, you can do reporting as with every good monitoring tool. However, even though the process of creating reports is simple, the output is most of the times unusable. The views and charts do scale portly inside the generated pdf and you end up having a document with charts and tables going over multiple sites. Additionally most of the times one is facing reports with views, which are empty at all because somehow the object which one selected is not supported for that kind of data. I am sure this does make sense somehow, but letting the user experiment what does work and what not, is just a frustrating processes. I ended up getting familiar with the API and script the report by myself. However, even this overcomplicated. As we need the “metric key” for gaining access to the metrics value we have to first get this one. Unfortunately, metric keys are documented for the vSphere part only, not for the Horizon Adapter. The only way I was able to obtain them was by exporting a dashboard which included those metrics and look into the generated xml. Once again, not the usability I expected.

Agree reporting could be better but to respond to some of your queries

"one is facing reports with views, which are empty at all because somehow the object which one selected is not supported for that kind of data" Not every object uses the same metric for example you wont get CPU utilisation from a datastore.

"As we need the “metric key” for gaining access to the metrics value we have to first get this one. Unfortunately, metric keys are documented for the vSphere part only, not for the Horizon Adapter. The only way I was able to obtain them was by exporting a dashboard which included those metrics and look into the generated xml. Once again, not the usability I expected". Use the super metric builder if you need the metric key (depending on the version of vrops)

3. Customizing Alerts

There are tons of alerts in vRealize and for sure you need to customize them to fit your environment. The process to do so is once again a complete disaster in terms of usability. Because alerts (and maybe symptoms too?) are resetted when performing an upgrade you have to duplicate all alerts (and maybe symptoms too?) first. Afterwards you can change what you’d like and in the end you have to disable the original Alert in the policy.

Ok, it’s not this much, but if you have to do this with every alert it really starts getting you frustrated, because you can’t simple go to alert and change for example the cpu usage percentage to 90% instead of 80%.

Never use anything out of the box if you plan to change it. Vrops is like any other monitoring product out there. If you configure it badly then you will get bad results. For alerting i would suggest creating your own policies (i have a company default with 15 sub policies off that one and use custom groups to assign the correct sub policy to the correct group of objects) disable all the OOTB alerts on the base policy and clone the OOTB alerts you want (Start with the min you need for your environment and then add the rest over time we have 15k VMs and 1000 hosts. we have 8 alerts currently so not that big a job to set up) and configure them as you need and enable the global ones in the base policy (will apply to all sub policies) and any environment specific ones you want disabled or enabled use the sub policies. The first thing i do when deploying vrops is disable all the OOTB alerts.

4. All the rest

Here some other issues that vRealize faces, as I not have time to get into detail in all of them, here a short list:

  1. Bad integration of stateless automated View Pools Dont use View so cant comment
  2. Complicated process for disabling Alerts on specific Objects Agree (how would you make this easier)
  3. No possibility to disable certain alerts during maintenance windows (for example backup times) Try creating you alert using dynamic thresholds after several weeks vrops will learn the normal operating ranges for the servers ( i have CPU alerts using DT because every Saturday or Sunday most VMs go to 100% CPU because of AV scans. Because vrops has learned this it will only alert of a VM goes outside of the known operating profile. It will alert if a VM on a wednesday runs between 10-30% but today it went to 80%)
  4. Finding the right objects when configuring widgets can get frustrating, as the navigation tree logic is abstruse. Again it is only abstruse if you dont understand the base metric and capacity models. The difference between demand and allocation That is why there is a search

Overall I would most probably not buy the product again. It’s inability to deliver knowledge of how to interpret and use the product is disappointing. Intuitively and easy handling is missing and you often end up just disabling alerts or metrics as you don’t know what they should represent anyway.

There are lots of webinars and books out there that explains everything (might not be vmware but they are there) take a look at vXpress  they have 1h videos almost every month going back to 2014 and they explain everything by one of the top VMware vrops engineers. also take a look at the books on amazon

The onliest reason for us to use it, is that at the time of purchase lots of horizon metrics could not be extracted directly through the api, what forced us to buy a product capable to monitor it. As VMware recently changed its API, one can do it now without buying an over engineered, over complicated product.

This is my personal thought. May I am just too dumb or lazy to understand this product, so just give it a try, if you wish to. For me the product would need fundamental refactoring or even a complete rebuild to get to something I can recommend to others.

I have been using this product since it was capacity IQ. I have spent the last 2 years 100% dedicated to developing it for our company. I agree there are a lot of frustrations with the product as it seems to to the complicated stuff well but the simple stuff can be frustrating but instead of bitching about it on a forum i engaged VMware through out TAM and now have monthly calls with the engineering team Managers and engineers, The product team managers and the UI team and managers. Because of this and giving honest feed back and constructive ways to improve the product i am now part of the VMWare Cloud Operations Design Partner Program and help bring not only the challenges i face from a day to day based but i also bring up some of the feedback here in hopes of fixing some of the short comings which VMware is dedicated to do. They do listen to the feed back and will add it to the product but with any large a and extremely complex backed end it takes time.

One thing i will recommend, learn what the metric are from the videos and books first it will make knowing what metric are and which ones you are interested in a lot easier. I probably only use 20 metric on a daily bases. That way it wont over whelm you.

MartinE11
Enthusiast
Enthusiast

Hello sxnxr

Thank you for your time and recommendations, i can agree with some of your points, altought not all of them:

"You may not have ment it to be a troll but it does look like one. Would you walk into the front door of a restaurant and say " Dont eat here it may not be for you because of x y & z. But i will try and answer some fo your concerns"

I do completely agree with you, that one should no complain about something, just because he did not have the willingness to dive into it. However, if users have to go through X hours of tutorials to understand a product, there, in my opinion, is something going wrong. As I stated I use the product for about half a year now, and I did spend lot of time to get used to it, but still I am not comfortable with the product. 

In my opinion, it is too easy to blame customers not to understand how to use the product. It is primarily the task of software engineering and UI-team to make everything as self-explanatory and easy to use as possible. For example inputs should be validated as soon as possible to prevent any faulty output, the process of selecting objects for configuring widgets should not include to scroll through 100 object-groups just to eventually find the object (no not every widgets provides you a search functionality), disabling and enabling alerts should not include someone to clone anything in advance and modify any policy afterwards, metrics and analysis tabs should be not left the users with more question marks then before consulting it and in the end, no one should be forced to buy a book or listen through several hours of tutorials to understand a product.

To respond to your analogy, for me it’s more like not recommending the restaurant, because you are sitting on a stake while you have to eat your sup with a fork. Event though the soup maybe excellent, it’s still uncomfortable, unhandy and pretty time consuming.

"Workload is calculated by taking the demand of a resource and dividing it by the effective capacity of that resource. If workload is just below, at, or above 100 percent, then the object has a high likelihood of having performance problems.

Workload can be greater than 100 percent, as demand can be greater than 100 percent of its currently assigned capacity. Host, virtual machine, and datastore objects will have their workload calculated directly from their own raw metrics, while cluster and data center objects will be derived from an average of all the children that are both vSphere hosts and virtual machines."

Thank you for clearing out this point. Do you have a link to VMware Doc where it is described as well, I was unable to find it?

"To be honest if you dont understand the base matric's and what they mean you will never know what the Analysis dashboard means"

Ok

"Use the super metric builder if you need the metric key (depending on the version of vrops)"

Thanks for the hint.

"Never use anything out of the box if you plan to change it. Vrops is like any other monitoring product out there. If you configure it badly then you will get bad results. For alerting i would suggest creating your own policies (i have a company default with 15 sub policies off that one and use custom groups to assign the correct sub policy to the correct group of objects) disable all the OOTB alerts on the base policy and clone the OOTB alerts you want (Start with the min you need for your environment and then add the rest over time we have 15k VMs and 1000 hosts. we have 8 alerts currently so not that big a job to set up) and configure them as you need and enable the global ones in the base policy (will apply to all sub policies) and any environment specific ones you want disabled or enabled use the sub policies. The first thing i do when deploying vrops is disable all the OOTB alerts."

First of all, I cannot think of someone out there not want to change any of the alerts that come out of the box. So if I follow you right, there should not be any OOTB deployments at all. However, I think VMware’s approach is, that they want to provide the customers with a base set of alerts to get started, because configuring all of them by themselves would be even more time consuming. Actually a good approach, and basically every product, like for example “Veeam One”, does the same. However, what I criticize is the time consuming and unusual process of customizing these alerts. Additionally the concept of using policies to apply alerts does may make sense in a more global fashion, but if you have to disable/enable alerts for certain objects only, working with policies soon starts to get unmanageable.

"I have been using this product since it was capacity IQ. I have spent the last 2 years 100% dedicated to developing it for our company. I agree there are a lot of frustrations with the product as it seems to to the complicated stuff well but the simple stuff can be frustrating but instead of bitching about it on a forum i engaged VMware through out TAM and now have monthly calls with the engineering team Managers and engineers, The product team managers and the UI team and managers. Because of this and giving honest feed back and constructive ways to improve the product i am now part of the VMWare Cloud Operations Design Partner Program and help bring not only the challenges i face from a day to day based but i also bring up some of the feedback here in hopes of fixing some of the short comings which VMware is dedicated to do. They do listen to the feed back and will add it to the product but with any large a andextremely complex backed end it takes time."

I clearly understand that someone working this close and long with the product is able to get around the complexity and is able to gain value out of the product.

But there are lots of companies that do not have the opportunity to spend this much time to get used to a product and do not want to have to tell VMware monthly how they should improve their product. What they are capable to do is writing into a forum and giving their opinion about the product, so may others can join the discussion. Just to repeat it once, this is my personal opinion and I apologize if my initial post was written too salty. Smiley Happy

sunnydua2011101
VMware Employee
VMware Employee

Hi MartinE11,

I am a Product Manager for vRealize Operations Manager at VMware. I wanted to thank you for your feedback and I am glad that sxnxr​ is able to provide you some guidance on how you can get benefits out of your investments in vRealize Operations Manager.

I do want to underline the fact, that the last release of the product was a step in the direction and we want to continue to simplify multiple use cases which you just mentioned.

Would be happy to connect with you on this topic if you would want to share additional feedback/concerns.

Kindly send me an email and we can setup a connect. I am on duas@vmware.com

Regards

Sunny

Regards Sunny
Reply
0 Kudos
ktototam
Contributor
Contributor

I just wanted to share the relatively easy method of temporarily disabling alerts for a VM, host or any other object in vROPS.

You can start a maintenance for this object by using  Inventory Explorer. Its location in v 6.6. is under Configuration menu.

Simply find your VM and start a maintenance. You will be able to configure a scheduled end of it or choose to end it manually.

You can also set up scheduled maintenance if you anticipate one.

vROPS does not collect metrics and does not trigger alerts for objects in maintenance state.

On the original subject that started this discussion - I must say that, for sure, vROPS is not an easiest piece of software or a content to grasp. However, nor virtualization itself Smiley Happy

It is not for us, system nerds, to complain about it. Rather to wrap our heads around it.

vROPS learning curve is steep.  But when past it - its VERY flexible and powerful product.

MartinE11
Enthusiast
Enthusiast

Hello ktototam

Thanks for participating the discussion and providing your solution. It actually really depends on the situation, if you do not care about any other alerts on this object during the specified time, then maintenance could really be a quick solution. However, in cases, you just want to disable one specific alert on one object, you probably better go with a new alert, based on a dynamic threshold like mentioned by sxnxr. Simply disabling an alert on a specific object is still something I wish to come in future product releases.

I am very surprised about the product acceptance by the sys-admins here. Comparing to other  monitoring tools like nagios, prtg or solarwinds, which all for sure also have their incapability’s, vRealize in my opinion does a few things exceptionally bad, what leads to a  steep learning curve and makes working with the product unnecessarily complicated. A few of them I noted in my previous posts and yes, you may are able to get around this issues, but overall this it is not what I expect from a product that claims to be a leading monitoring solution.

Reply
0 Kudos
sxnxr
Commander
Commander

"vRealize in my opinion does a few things exceptionally bad"

I agree 100% with this. In my experience vrops does some of the simple things bad and the complicated things well which for me and i would guess most ppl makes you want to pull your hair out when a user ask you can they do a report with a list of the vms and there port groups that are attached to and you say yea sure no problem sounds simple enough. 3h later you have to run RVtools to get the data because vrops has the relationship but does not let you report on them. This is just one of many small but important thing that it does not do. Be leave it or not but up until recently ( one of the 6.x versions) you could not even do a report of the VMs and there datastores they were on.

I could go on as i have a very long list of stuff i would like. That said I did look back and over 50% have now been added. I am on the vrops design partner program and i can tell you first hand that the vrops team at VMware are just as passionate about the product and improving it as most of the ppl in this forum are. If you give constructive feedback they will listen and TBH this is one of the reasons we have stuck with the product.

A great way to see how far it has come in the last 2 years take a look at the whats new videos from vxpress

"what leads to a  steep learning curve and makes working with the product unnecessarily complicated"

I love the granularity of the product as it is but then i have had that learning curve and have been dedicated to vrops for 3 years now in my current roll so i have had the days of banging my head off the wall. One piece of advice i will give you is take it slow dont try to do everything at once. Split the deployment into sections. If i was learning from scratch i would:

  1. work on the policies and learn them
  2. Learn what group types are and how they affect the granularity to which you can customise the object widgets display
  3. Learn custom groups and be creative with the membership rules
  4. Learn the benefit of CDCs (i love CDC's the amount of configuration they same me as i use them for my custom group membership rules)

Having a good understanding of these will make you dashboards and reporting a lot easier. dont just jump in to the dashboards with out understanding what in the background is in play to make them

Then learn the capacity management function. I could spend weeks talking about this alone but again some good videos on vxpress

For me the last thing to learn is alerts. These can be heavily influenced with policies and custom groups so having a good understanding of them first is a must.

One of the hardest things with vrops is setting the expectations with management in your company. They seem to think it is a small product and easy to setup and maintain but is is not. I would liken vrops to Microsoft SCOM in size and scope. We have over 1000 hosts and 16k VMs and i alone work on vrops but there is a full team that works on SCOM as that is expected. so i set the scope of vrops out to my manager listing out the benefits and savings and the expectation that i would need to be dedicated to it to make it a success. Lucky for me the listened and now we have 6 separate instances, All our automated deployments rely on vrops and we are moving all our alerting from ITM to vrops for the virtual environment. This was all done by setting out the expectations up front.

I hope this wall of text helped. I dint mean to go on for so long