The material also mentions recommendation is that if HA is used, should have a minimum of 4 nodes, why?
I am guessing the recommendation for minimum of 4 is:
- Master-Replica for core function - no data
- Need 2 data nodes so that they are redundant.
If you choose only to use a Master-Replica you get no scale out and there is the additional penalty between them caused by replication overhead.
All Analytics nodes need to be the same size, period. Exception to this are remote collectors, which will often be smaller than the nodes in the Analytics cluster.
Master-Replica nodes do have the data role. However, they also have the cluster configuration data. As you add nodes in the cluster, you can specify "data" nodes. This merely means they can act as collectors and also store data. It also means they will NOT be a master or replica role unless you activate that role. As you add nodes in to a cluster with HA enable, with exception of remote collector role nodes, they become part of N+1 data redundancy to store historical metric data. At any given time, you can lose any node and have data consistency.
Now, Master/Replica is something different. If you have a Master without a replica, you're pretty much in a jam because you don't have a master in the cluster if the master node drops. If you have a master AND replica, the replica will come online and that will be your new master (=all is well). This is also why it is recommended to put replica on different storage+hosts to ensure a minimum of the metadata/config data availability.
The replication between the master and replica is just meta data and some other back-end DB content.. but it's still there and very important. The bulk of the data written N+1 across nodes is metric data, which happens as it is collected, so that is front-loaded and don't need to worry about the replication too much. However, you can also add data nodes after the cluster has been brought online. That will require a "rebalance" of the data across the nodes in the analytics cluster, which can definitely take some time as the nodes rebalance all of the data to a N+1 fashion including that new data-roled node.
There is node sizing and node quantity guidance on designing this that is released based on QE results on the most successful deployment combinations. Some just don't work perfectly, hence them being "unsupported". Stick with the recommended combinations to have the most successful deployment.
So if a single "large" node will meet my sizing needs, I could create a
Large Master Node, add a second large node and set it as the Replica.
This would be a full copy of the first. This HA config will handle all
functions. What are the downsides to a set of 2 nodes vs 4?
I am assuming since no load balancing is being discussed, that all data
collections are pulls?
I have been unable to find any sizing specifics even on the partner portal.
On Thu, Jan 8, 2015 at 3:01 PM, mark.j <email@example.com>
Node sizing guidance is here:
Also, see the attached XLS on that KB.
Smaller nodes means you'd need more horiz scaling.. meaning you can spread out the IOPs load between different storage and compute resources. It'll also benefit you by supporting more clients, as you'll have more nodes to LB client access against. However, sometimes large will suffice if you've got the resources and don't have the client access quantity requirements.
Note this detail in the KB: "Maximum number of certified concurrent users per node (regardless of node size): 4 ".
Thanks that helps alot.
If I need to monitor 12,000 to start, could I start with 3 Medium
nodes? I am assuming Data Nodes will exist on the Master, Replica, and
Data node? I could then scale this out by adding Data nodes as needed up
to 8 (HA Limit).
On Thu, Jan 8, 2015 at 3:40 PM, mark.j <firstname.lastname@example.org>
12,000 VMs? With vR Ops HA enabled?
Objects are only part of the mix, so the type of adapter does carry some weight. There is overhead per object and then whatever multiple of metrics per object type atop that. It all comes down to # objects and # metrics when we're sizing this stuff.
Plug the #s in to the XLS and see what you get. Hypothetically, if we were talking vR Ops HA, 12000 VMs, 6-months data retention, with 500 hosts and 1000 Datastores, you'd need... 8x medium nodes OR 4x large nodes.
If I had to pick one of the above options, I'd say go for 8x medium nodes to spread out your load and increase # supported user sessions (hitting a load balancer for web interface access).
Do HA enabled always need to be in pairs? For example, if I have 4 node and need more capacity, can I add a 5th node or does it always need to be an even # for HA?
Also, if I am going with 4 large nodes for HA, is it recommended to make 2 node Master-Replica and the other 2 data only, or do you typically allow all the other 3 nodes to be data/replica nodes?
You can't actually 'disable' the Data role on the nodes in the GUI, unless you have a remote collector (where that role isn't running). If you were running cmdline scripts, you could drop the data role. However, I don't see why you've want to do that.
We'll typically deploy the nodes in even quantities as a best practice right now. However, as the deployments get some more mileage under their belt you may see some odd quantities getting more QE and documented supportability.
What is the preferred order of install for a 4 node HA cluster (Master, Replica, Data, Data)?
Do I install all 4 nodes add them to a single vROPs cluster and then enable them for HA?
Should I do this before connecting and configuring adapters?
If this is a NEW deployment, then add all of the nodes before initializing the cluster. Typically you'll add HA/replica role status after you initialize/start the cluster. Once you're online and cluster is how you want it, add new solutions however you please.
I have two instances of Hyperic in my 4 node vROPs HA environment.
When deploying the Hyperic adapter in a 4 node HA design, should I just enter the vROPS Remote Collector URL as the Master node for both Hyperic adapter instances or can I enter one for Master and one for the Replica to distribute the load? I am not clear as to if the cluster just intelligently handles this or not?