Data Distribution across cluster nodes

murben · ‎12-06-2016

I have currently been testing LogInsight in a single node configuration. This server has approximately 1.5TB of event data. If I add 2 additional worker nodes to create an HA cluster:

1) Are all nodes required to have the same amount of storage assigned?

2) When I join the additional nodes, will LogInsight only store events on the 2 new nodes until they catch up to the original node? or will all 3 nodes grow at the same rate keeping the original node 1.5TB ahead on storage?

Thanks!

admin · ‎12-20-2016

Each event is stored in a single on-disk bucket. When working with buckets, be aware of the following behaviors and characteristics.

Buckets can reach a maximum size of 1GB. When a bucket reaches 1GB, it is sealed and can no longer be written to and is marked as to be archived. After a sealed bucket is archived, it is marked as archived. This means an event may be retained locally and in the archives at the same time.
Buckets are not replicated across vRealize Log Insight nodes. If you lose a node then you lose the data on that node.
All buckets are stored on the /storage/core partition.
vRealize Log Insight deletes (or archives if archiving is setup) old buckets when available space on the /storage/core partition is less than 3%. Deletion (or archiving if archiving is setup) is done using a FIFO model.

Thanks.

murben · ‎01-10-2017

I'm not sure how this answers either of my questions?

admin · ‎01-10-2017

Ok let me try again

1) Are all nodes required to have the same amount of storage assigned? - Not necessarily but its a good practice to keep them same or similar, but you can have varying sizes.

2) When I join the additional nodes, will LogInsight only store events on the 2 new nodes until they catch up to the original node? or will all 3 nodes grow at the same rate keeping the original node 1.5TB ahead on storage? - If you add 2 nodes and configure the ILB (recommended) then data is ingested by the node holding the ILB VIP. Each event is only on one node and never duplicated. If ILB is not configured then the master node will be the node ingesting events and if it goes down or is upgrading then there is a possibility of dropped events. The new nodes will never 'catch up' with the original node as events are never duplicated across nodes.

Hope this helps.

All

Data Distribution across cluster nodes