Tanzu GemFire Sizing: Disk, CPU, and Network Considerations

Welcome to the last blog in our series about VMware Tanzu GemFire sizing. In our previous two blogs we covered how to get started with GemFire sizing, and redundancy strategies, and the importance of maintaining free capacity.

Today, we’ll explore disk sizing, CPU provisioning, and network performance—three critical aspects of determining if your deployment can handle real-world workloads.

Disk Sizing

Persistence Storage Requirements
If you’ve enabled persistent stores, plan for about 3x your in-memory data – or more. GemFire’s append-only disk strategy for performance creates a scenario where compaction might briefly need nearly double the disk space. Given that “disk is cheap” err on the side of caution. Don’t forget there are logs and OS patches that might chew up that persistence.
Overflow and Queues
- Overflow: When in-memory data exceeds your configured threshold, GemFire can overflow to disk. This saves heap space but consumes disk.
- Durable Queues: WAN replication or asynchronous event queues can rapidly fill up if you’re storing large amounts of data during downtime on the receiving side. In high-throughput environments, queue storage might grow to terabytes if you need to retain hours or days of transactions in the queue.

Rule of Thumb: For a safe starting point, 4–5 times more disk than your total in-memory size is not unusual when persistence and overflow are both in play.

Another critical factor to consider is the performance of your storage layer. Since GemFire functions as a database, meeting tight system-level agreements (SLAs) and ensuring high performance requires careful attention to your disk architecture. It’s essential to design your storage solution to align with your specific SLA requirements, whether that means utilizing dedicated direct-attached storage or opting for shared network storage. Properly architecting your storage ensures that GemFire can deliver the performance your applications demand.

CPU Provisioning

CPU provisioning is often challenging to estimate accurately. Often, system engineers aim to minimize costs by limiting CPU resources, which can lead to under-provisioning. However, it’s essential to closely monitor your application’s requirements and allocate at least 50% more CPU capacity than estimated. Reserving this extra capacity ensures that your applications remain responsive during peak times to mitigate slowdowns and avoid critical issues when additional processing power is needed.

Basic Rules of Thumb:
2–4 cores for development and small heaps.
6–8 cores (or more) for production, especially if you have larger heaps or performance testing in mind.

While this may seem like too many cores, keep in mind that Java-based platforms rely on a garbage collector, which competes with the actual data operations in GemFire. Extra CPU headroom ensures your GC runs smoothly while still meeting application SLAs. Of course we recommend you also monitor and adjust CPU utilization and garbage collection logs to determine if more cores are required. It’s better to overshoot slightly than find yourself short on processing power in production during critical traffic spikes.

Network Performance

With GemFire sizing network performance plays a pivotal role in ensuring your data grid operates smoothly and efficiently. As a distributed system, GemFire relies heavily on a robust and high-throughput network to manage data movement, whether it’s for maintaining redundancy through rebalancing or handling WAN replication. Reliable network infrastructure not only supports seamless downtime recovery by swiftly redistributing data when nodes fail or links drop, it also minimizes the overall business impact during both planned and unplanned events.

Investing in quality network solutions is essential to maximize GemFire’s performance and resilience, so your applications remain responsive and your data remains secure under all conditions.

High Availability and Data Movement
GemFire is a distributed system, and data movement—whether for redundancy rebalancing or WAN replication—depends on a reliable, high-throughput network. The slower the network the longer it takes to get your data safe and in policy.
Downtime Recovery
When a node goes down or a WAN link fails, GemFire must move data around to restore redundancy or catch up on replication. A slow network can extend the time you’re running at risk with fewer redundant copies.
Investing in Quality
A robust network device or infrastructure is crucial. The better your network, the faster GemFire can respond to both planned and unplanned events, minimizing business impact.

For a good example of the typical transfer speeds for various dataset sizes and bandwidths, here is a chart by Google that sheds light on moving data around on a network.

Source: Google

Bringing It All Together

Start with Rough Estimates: Don’t get bogged down in perfection. Use broad calculations to define your initial cluster size.
Scale as You Learn: GemFire’s flexibility allows you to add nodes or disk capacity as your use cases evolve.
Monitor Continuously: CPU, memory, disk usage, and network throughput should all be tracked to catch early signs of stress.

Remember, the goal is to prioritize resilience over merely minimizing resource usage. By allocating ample margins in memory, disk, CPU, and network capacity for your GemFire deployment, you ensure the ability to gracefully handle unexpected spikes and both planned and unplanned downtimes.

GemFire sizing is as much art as science. Start with generous estimates, keep a close eye on performance metrics, and scale as needed. Redundancy, disk space, CPU cores, and network throughput all play a part in maintaining a responsive, reliable data grid that can power your mission-critical applications.

Thank you for joining this three-part journey into GemFire sizing. With these guidelines in hand, you’re well-equipped to build a robust environment that grows alongside your business needs. Happy deploying!

Learn more about GemFire’s unique value with these videos:
Enhancing Logistics with Tanzu GemFire: Geospatial Queries for Real-Time Insights
How VMware GemFire Can Manage Traffic Surges and Reduce Expenses