Welcome to the second blog in our Tanzu GemFire sizing series. In the first blog, we introduced the fundamental questions to guide your GemFire capacity planning. Now, let’s get more specific—covering the recommended levels of redundancy, the importance of maintaining free capacity, and a straightforward sizing example you can adapt to your own environment.
Redundancy: How many copies of data do I need?
GemFire empowers you to control the number of redundant copies used to ensure high availability. Whether you choose to maintain zero, one, two, three, or every member gets a copy, it’s essential to carefully consider the optimal number for your deployment. Let’s explore the key factors involved in determining the appropriate number of redundant copies.
- One Redundant Copy
With one copy, you can survive a single high-availability event and keep your data safe. For instance, if you have the original and one redundant copy, you can perform rolling upgrades while the application remains live and data is still accessible. - Two Redundant Copies
I typically recommend maintaining two redundant copies for peace of mind. This setup allows you to manage unexpected situations during routine operations, such as upgrades, without compromising data safety. With two copies, your data remains secure even if another team inadvertently takes a server offline during that upgrade, ensuring you stay within your data safety policy limits.
Key Takeaway: Redundancy helps you manage unforeseen downtime without your application even noticing something is amiss.
Why Keep 50% Free Capacity?
A common GemFire sizing mantra is “Always leave 50% headroom.” Why?
- Handling High-Availability Events
When a server fails, the remaining servers must have sufficient capacity to restore redundancy without exceeding their limits. Maintaining an additional 50% capacity ensures you can seamlessly redistribute data without hitting capacity constraints. For example, if you have four servers each utilizing 50% of their 40 GB storage and one server goes down, the remaining three servers would each need to handle approximately 66.7% of their capacity. This buffer allows your system to maintain data safety and performance even during unexpected outages. - Inaccuracies in Initial Estimates
Remember, your original projections might not be perfect. Having buffer space allows you to adapt when real-world data patterns differ from your assumptions. - Operational Flexibility
Need a new index unexpectedly? No problem. The extra capacity ensures you won’t have to scramble for resources or endure major performance hits.
In short, 50% free capacity is your safety net. You can always scale down later if you find yourself with too much overhead, but it’s much more painful to scramble for resources once you’re in production.
A Simple Storage Sizing Example
Let’s run through a hypothetical:
- Estimate Base Storage Needs:
Suppose you need 200 GB of data storage in memory. - Decide on Redundancy Level:
You want two redundant copies (the original plus two copies) for robust data protection. That multiplies the base storage to 600 GB. - Add 50% Free Capacity:
Now you double the 600 GB to allow for both planned and unplanned events. That takes you to 1,200 GB total required capacity. - Translate into Servers:
- Let’s pick a 256 GB virtual machine as our standard.
- Reserve 20% of that for the operating system, monitoring tools, etc., leaving 80% (about 204 GB, often rounded down to 200 GB) for GemFire.
- If you need 1,200 GB, you’ll end up with six data servers (6 x 200 GB = 1,200 GB) to comfortably meet your redundancy and capacity goals.
The Benefits of Over-Provisioning
Some might view deploying six servers for 1,200 GB as excessive. However, this configuration allows you to lose two servers without risking data loss or impacting your application. GemFire automatically redistributes data across the remaining four servers, ensuring seamless operation for your users. In contrast, if all your data were housed on just two servers, losing one would overwhelm the remaining server, potentially degrading performance and reliability.
Looking Ahead
Stay tuned for the last blog in our three-part GemFire sizing series, where we’ll complete our deep dive into GemFire sizing and discuss the practical steps to ensure your deployment is both resilient and high-performing. In this final installment, we’ll examine the other crucial aspects of GemFire sizing, such as disk requirements, CPU considerations, and network performance. We’ll also look at how features like WAN replication and overflow can significantly impact your storage and capacity planning.