At a high level, capacity planning for GemFire involves estimating capacity and resources to fulfill it, and verifying and adjusting the estimated values through testing. The first and main task of a capacity planning exercise for a GemFire application is figuring out the memory requirements. Most everything else flows from there. This article is about memory sizing: it offers guidelines and methodology for determining memory requirements that are especially important for large scale systems, as the importance of efficient resource management grows with scale.
A memory sizing exercise is an iterative process that involves the following:
- Sizing data and associated overhead. This is the first step: figuring out how much memory is required to hold all the data.
- JVM Heap Sizing. Determine the JVM heap size, and amount of data per heap; this comprises the “unit of scale” for a GemFire distributed system, as it determines how many GemFire data nodes are required and the amount of data per node. This step also determines how many JVM’s to run on a single host. The results should be verified by running tests and checking SLA parameters.
- Sizing application overhead. If the data nodes are supposed to run any business logic code (for example, via queries, functions, or callbacks), the associated overhead should be determined and taken into account. This is best done by running small scale tests that execute representative business logic using the unit of scale settings determined in the previous step.
- Adjust the unit of scale configuration if needed, and rerun tests. Based on the results from the last step, some adjustments may be necessary. For example, the amount of data per heap may have to be reduced to accommodate application workload.
- Scale-out testing. Using the unit of scale configuration conduct tests of gradually increasing scale. Verify SLA parameters at each scale level.
Each step involves testing, either to determine data sizes, application overhead, or to verify the sizing results and system configuration. GemFire statistics includes just about all the metrics we need throughout this process, so they should be enabled for all the tests. Upon each test, the statistics can be examined using GemFire’s Visual Statistics Display (VSD).
Data Sizing, and Estimating Memory Requirements
To estimate required memory, we need to size the data domain objects, and GemFire's own memory overhead for the chosen data model, GemFire architecture and configuration. Memory Requirements for Cache Data in GemFire User’s Guide provides a detailed breakdown of memory requirements, which can be used for estimating purposes. Determining data sizes and overhead in memory more accurately requires experimentation, as data has to be loaded in memory in order to be sized accurately. The best approach is to run a small test (that loads a relatively small number of objects that are representative of the actual payload) for each data type, and use one of the following methods to determine average object size, and GemFire overhead per object:
This utility class relies on GemFire internal sizing facilities to provide methods for sizing data objects in memory.
§ GemFire Statistics and VSD (Visual Statistics Display)
Run a simple test that stores the data objects to be sized in a partitioned region, making sure that all of the data is in memory (it can persisted as well, but not overflown). Then examine the PartitionedRegionStats dataStoreBytesInUse and dataStoreEntryCount stats in VSD. Dividing dataStoreBytesInUse by dataStoreEntryCount will provide an estimate of how many bytes are used per entry. Note that the entries will typically be in a serialized form.
§ Heap histogram
Heap histogram can be used to determine both data size and overhead in memory. Using the number of data entries stored in memory (which can be determined using VSD and GemFire stats, or gfsh, for example) we can determine which data structures in a histogram are associated with data entries, how much memory they use, and calculate per entry memory use and GemFire overhead.
As the above methods are based on actual memory use they tend to provide fairly accurate results, making the data sizing exercise less of a guessing game. They also report entry size including the region overhead, which simplifies the rest of the calculation.
Partitioned Region Sizing
Larger data volumes require partitioning over a number of data nodes. That number has to grow as the data volume grows. GemFire’s partitioned region is the data store used in large scale scenarios. There is an important configuration parameter for partitioned regions that should not be overlooked during sizing: it is the number of buckets in a partitioned region.
Partitioned regions consist of partitions, which in turn consist of buckets. A partition is a portion of a partitioned region that fits in a single JVM. A bucket is the smallest unit of data in a partitioned region that can be moved from one partition (JVM) to another during rebalancing. That makes the number of buckets in a partitioned region an important configuration setting. It affects how well the data is balanced, and how much data is moved during rebalancing. The number of buckets should be a prime number high enough for the data to be well balanced across the entire cluster. A good way to calculate it is to take the number of estimated JVM's needed to host all the data in the region, multiply it by a factor high enough to accomplish a balanced data distribution, and round up the result to the next prime. The next question is what is a high enough factor? There are a couple of things that should be factored into that decision:
- Given that the number of buckets is a prime number, some JVM's will have one bucket more than others, which means that their heap occupancy will be higher. If that difference is large enough to matter (e.g. triggers the heap eviction, or is too close to the eviction threshold), the number of buckets should be increased, so that all the JVM's have similar heap usage levels. They can't be the same, but the differences should not cause significant differences in behavior.
- Once a region is created, the number of buckets for it cannot be changed, so it's a good practice to select a number high enough to accommodate future growth. For example, if growth projections indicate a ten-fold increase in size, the number of buckets should be set to 10 times the initial estimate, or even higher, just to be safe. On the other hand, buckets have to be created and managed so having an unnecessarily large number of them would be wasteful.
The default value for the number of buckets is 113. For a cluster that has over a dozen cache servers (JVM's) that might not be enough. The factor of 10 is probably a good starting point for calculating the number of buckets. So, if a cluster is sized at 32 nodes, for example, we could start with 32 x 10 = 320, round up to a prime, and get 331 as our number of buckets. If each JVM hosts around 10GB of data, that is a total of 320GB of data, or about 990MB per bucket. Since the total number of buckets is 331, some JVM's will have 10, and some will have 11 buckets, and the difference in data volume between such JVM's will be 990MB, or around 10% of data in a JVM. We may want to have a more even data distribution. Doubling the number of buckets (32 x 20, round up to 641) reduces the bucket size to approx. 500MB, resulting in a more even data distribution across the cluster. Add to this a five fold growth projection, and the number of buckets comes out to be 32 x 20 x 5, and rounded up to a prime: 3203.
JVM Heap Sizing Considerations
Once we have the memory requirements, we need to determine the JVM heap size that will be used for each GemFire data node, and how much data we can store in a single JVM, so that we can estimate the number of JVM's and machines needed to fulfill the memory requirements. At that point we have a configuration that we can run with to empirically validate what we have, and make any adjustments, if needed.
JVM heap sizing is a game of tradeoff between size and performance. Ideally, we would like to use all the available host system memory for a single GemFire data node JVM, keeping the number of data nodes in the cluster down to a minimum: one per host. That should be the starting point in a heap sizing process: one JVM per host. This becomes more important as the volume of data grows. The question of one vs. two JVM's per host turns into the question of 10 vs. 20, or 50 vs. 100 machines, for example. For reference, GemFire has been used successfuly with heaps of well over 64GB in size. However, sometimes a very large heap may not be optimal. This section discusses why, and offers guidelines on how to go about selecting the optimal heap size and amount of data to store in it.
§ GC Pauses and Large Heaps
One of the biggest issues with large heaps is GC pauses. If they are prohibitively long and cannot be reduced enough by GC tuning, one option is to use more than one JVM per host, in order to reduce the heap size, and subsequently GC pauses.
§ The 32GB Line: Benefits of Compressed Oops
Another potential benefit of keeping the JVM heap size not too large (up to 32 GB), is the space savings from compressed oops, which can be had for heaps of up to 32 GB in size. The memory overhead in 64-bit JVM's can be up to 50% higher than in 32-bit JVM's (meaning that the same amount of data can require 50% more heap), so the space savings can be significant. Due to this overhead, there usually isn't much benefit to sizing the heap of a 64-bit JVM to any value between 32 GB and 48 GB. On the other hand, if available memory is larger than 48GB, then using a larger heap should be considered. As mentioned earlier, that becomes more important at large scale, and GemFire has been used in deployments with very large heap sizes--well above 48 GB. Compressed oops feature is used by default in Java SE 6u23 and later for heap sizes below 32GB. For earlier releases it can be enabled using the -XX:CompressedOops flag.
§ JVM Headroom
For latency sensitive applications the JVM heap occupancy is a key factor. The lower the latency, the larger the headroom has to be. It is not uncommon to have to keep 50% of the heap free for headroom. In practice, we see that around 30% tends to be sufficient. However, only adequate load testing can show if that is sufficient. For latency sensitive applications, 50% is a good starting point. This means that during the initial experimentation only up to 50% of the entire heap should be used as data storage.
§ JVM Heap Size with Respect to the Total Available Memory
This can vary from one OS to another, but somewhere between 500MB and 1GB of RAM for the OS is usually a good starting point. So for example, if a host or a VM has 48 GB of RAM, 47 GB could be divided equally between a couple of JVM's, and 1 GB could be left to the OS. The metric that is a key indicator if that is enough is OS swapping. If swapping starts to occur (which can be determined using either OS tools such as vmstat, or GemFire’s statistics and VSD utility), the OS should be given more memory. Getting this right might require some experimentation.
§ Data Serialization
In addition to better performance, GemFire PDX serialization can provide significant space savings over Java Serializable: we have seen savings of up to 65%, but they they will vary depending on the domain objects. In fact, PDX serialization is most likely to provide the most space savings of all available options. DataSerializable is more compact, but it requires that objects be deserialized on access, so that should be taken into account. On the other hand, PDX serializable does not require deserialization for most operations, and because of that it may provide greater space savings. In any case, the kinds and volumes of operations that would be done on the server side should be considered in the context of data serialization, as GemFire has to deserialize data for some types of operations (access). For example, if a function invokes a get operation on the server side, the value returned from the get operation will be deserialized in most cases (the only time it will not be deserialized is when PDX serialization is used and the read-serialized attribute is set). The only way to find out the actual overhead is by running tests, and examining the memory usage.
§ Application Overhead
In addition to data, and data associated overhead, application workload has to be accommodated as well. Functions, queries and other business logic executing in data nodes will incur memory overhead. That overhead is best determined through testing, as it can be very difficult to estimate memory overhead for an arbitrarily complex application. So, once the data related overhead is known, and the unit of scale configuration set (JVM heap size, and amount of data per JVM) it’s time to run tests that will execute the business logic code that is supposed to run in data nodes. GemFire statistics and VSD can then be used to examine the memory usage and determine the overhead.
Verification Through Experimentation and Scale-out Testing
It is only through testing that we can get a full picture of resource use and performance for an application. Testing is the only way to assess if all the SLA parameters are met. It is an iterative process of running tests, analyzing resources and performance, making adjustments, and running tests again.
The larger the scale of deployment the more important this process is. The basic steps are as follows:
- Based on the capacity estimates, configure a unit of scale (single JVM) and perform the lowest scale test: typically two nodes (for distribution and redundancy),
- Scale out incrementally and confirm the expected (near-linear) scaleout at each step,
- Project the final scale-out based on the findings.
At each step, analyze the resources, and performance. gfsh is very effective for quick checks of the system. It can be used to verify that all the data is loaded, and well balanced across the entire cluster with a single command: issuing “ls –m” for a partitioned region will list all the members that host the region, and show the number of entries per member.
GemFire's VSD tool is a "one stop shop" for the verification of GemFire runtime performance. The article Using VSD to Analyze GemFire Runtime Configuration, Resources, and Performance describes the statistics that are useful for the verification of resources during the sizing process as well. They include the heap usage, GC, network, CPU, and other resources. It also describes the partitioned regions statistics that are useful in verifying how well balanced the data distribution is.