Hi everyone,
I have in the global setting that :
But when I create a trend view I only get 2 months data, starting from 1 July. Any explanation ?
Thank you.
You're looking at the SINGLE NODE maximum of 3,500,000 metrics. For multiple nodes, or 1 master and 1 data node, each node has a max # metrics of 2,500,000, 10k objects. Further, your cluster management view of # metrics isn't the place to look (it is wrong). Open a System Audit view and look at # metrics collecting and # objects collecting.
Also what you haven't said is if you're using HA ad you have a Master and a Master Replica.. doing so would HALF your # metrics and # objects supported.
Basically, just generate the system audit and append it here, along with a more complete screenshot of your cluster management view including role names.
In your screenshot, you can see that you had a purge of data Aug-1. We need more visibility, though, in to both nodes disk space %-util. Give us this view, going back 90 days (I'm laying it out so you know how to make it):
Yup, you don't have enough disk space on your vR Ops nodes. When you hit 85% utilization, it'll do an automatic purge and free up space for you. The smaller the disk you have, the larger chunk of data it'll carve off. I've seen 400 GB vR Ops nodes do a purge at 85% and bring the utilization down to 65%, so ~100-150 GB.
vR Ops doesn't just randomly do this - it'll generate alerts first at the 80% or so point (I don't recall the symptom threshold offhand). As long as you have notifications set up for the vR Ops Self Monitoring, you'll know before it happens and have an opportunity to add disk space.
Hi Mark.j,
Thanks a lot for your reply. I don't think that's the main cause. The master node has over 400GB and it has over 150 GB free space. But yes the workload is strangely high 91%.
Same for the Data Node: CapacityRemainnig 75GB and 91% Workload .
Is this still not enough? You think we should add more space?
Thank you again.
Regards
By the way, which Workload metric should we take in consideration ? Disk Space | Workload % or Disk Space - Total Usage | Workload % ???
Here is the workload trend for the Master node:
Moreover, according to vROps sizing table:
As we have two large nodes, we are way too far from the maximum :
Thank you in advance.
You're looking at the SINGLE NODE maximum of 3,500,000 metrics. For multiple nodes, or 1 master and 1 data node, each node has a max # metrics of 2,500,000, 10k objects. Further, your cluster management view of # metrics isn't the place to look (it is wrong). Open a System Audit view and look at # metrics collecting and # objects collecting.
Also what you haven't said is if you're using HA ad you have a Master and a Master Replica.. doing so would HALF your # metrics and # objects supported.
Basically, just generate the system audit and append it here, along with a more complete screenshot of your cluster management view including role names.
In your screenshot, you can see that you had a purge of data Aug-1. We need more visibility, though, in to both nodes disk space %-util. Give us this view, going back 90 days (I'm laying it out so you know how to make it):
Hi Mark.j
I need your help please to understand. In fact, for us, the maximum objects per Node (Multi-Node mode) is 10000. and the maximum collected metric per Node is 2500000. So as we have two nodes: Master and Data, our real limit is 20000 for objects and 5000000 for the metrics.
In the Audit report, Resources configured is 10809, and Resources Collecting is 10780 ( that is for both nodes). Regarding the metrics: Metric Configured is 16401370, Metric Collecting is 2173375 , Super Metrics is 737683 and vCenter Operations Generated is 105690.
My questions are:
Thank you very much.
The best place to look to see if you are sized correctly is the attachment on vRealize Operations Manager 6.1, 6.2, and 6.2.1 Sizing Guidelines (2130551) | VMware KB download it and go to the advanced tab and plug in the info from your audit report. I spent months arguing with GSS about right sizing our environment because they were using the admin UI to determine what was being collected. Once they realised they were wrong and used the audit report and conceded that i was right sized they fixed the performance problems i was having (this didn't include 6 nodes with 16 vcpus as they originally suggested) Long story short this guide is very accurate. As mark stated make sure that you put HA enabled if you have a master replica.
On the data retention as far as i know it it all disk space required. The more history the more disk space you need. Dont forget to extend all nodes to be the same if using HA