Ensuring your systems run smooth even when your data center has a hiccup, or a real disaster strikes is critical for many companies to survive when hardships befall them. As we enter the age of the zettabyte, seamless disaster recovery has become even more critical and difficult. There is more data than we have ever handled before, and most of it is very, very big.
Most disaster recovery (DR) sites are in standby mode—assets sitting idle, waiting for their turn. The sites are either holding data copied through a storage area network (SAN) or using other data replication mechanisms to propagate information from a live site to a standby site. When disaster strikes clients are redirected to the standby site where they’re greeted with a polite “please wait” while the site spins up.
At best, the DR site is a hot standby that is ready to go on short notice. DNS redirects clients to the DR site and they’re good to go.
What about all the machines at the DR site? With active/passive replication you can probably do queries on the slave site, but what if you want to make full use of all of that expensive gear and go active/active? The challenge is in the data replication technology. Most current data replication architectures are one-way If it’s not one-way, it can come with restrictions—for example, you need to avoid opening files with exclusive access. Continue reading →
Apache Derby is used for its RDBMS components, JDBC driver, query engine, and network server.
The partitioning technology of GemFire is used to implement horizontal partitioning features of vFabric SQLFire.
vFabric SQLFire specifically enhances the Apache Derby components, such as the query engine, the SQL interface, data persistence, and data eviction, as well as adding additional components like SQL commands, stored procedures, system tables, functions, persistence disk stores, listeners, and locators, to operate a highly distributed and fault tolerant data management cluster.
Application developers and data management teams continue to look for ways to modernize legacy apps, manage costs more effectively, build new apps on robust application platforms, and solve big data problems. These are some of the key reasons why vFabric is on the CIO (or CTO) agenda. With several new product releases in the vFabric Suite, VMware continues to provide a best-in-class application platform and help customers solve their top application development and data management problems.
Big, fast data is powering some of the most interesting computing opportunities in today’s market. But in order to get there, we need to change our approach to the data tier. Enterprises are trying to move from costly mainframe architectures to virtualized datacenters and utilize commodity hardware more efficiently. With the data tier, this means an architecture that scales horizontally by adding more commodity-based computing and storage at runtime.
To scale the data tier horizontally, companies use systems like vFabric GemFire, a distributed data system that is designed to specifically accommodate large data sets across commodity hardware nodes. In GemFire, data is spread across members of a cluster with members referred to as “nodes,” and the distribution of data across those nodes is called “partitioning.” vFabric GemFire then allows developers to query the data that resides across many nodes while retaining core values of very high performance at scale. How? In short, the answer is “Data Aware Querying” – a query API that allows a query to execute on selective nodes instead of all nodes (i.e. execute in a map-reduce style).
Recently, we had the opportunity to speak with architect Brett Cameron about vFabric RabbitMQ. A popular speaker, Brett is well known for his effort to port Erlang and RabbitMQ over to the “legacy” OpenVMS operating system platform (now owned by HP). With over 19 years in the software industry, Brett specializes in systems integration and large, distributed systems. Of course, he has spent a lot of time with OpenVMS – an OS with one of the more interesting histories in the software industry.
When we started chatting with Brett, he had recently discussed the concept of the Polyglot Rabbit with Alexis Richardson and written a great article titled, “The Polyglot Rabbit: Examples of Multi-Protocol Queues in RabbitMQ.” According to Brett, the main goal of this article is about the fact that you can publish messages into this environment via one protocol and consume via one or more other protocols (simultaneously if you want). “It’s a brilliant and a very powerful capability.” Brett felt that this capability was possibly not being promoted enough, and hopefully the article will go some way towards fixing this.
How do you plan a roadmap for moving from a legacy data architecture to a cloud-enabled data grid? In this article, we will offer a pragmatic, three-stage approach. At SpringOne-2012, the “Effective design patterns with NewSQL” session (see presentation embedded below) generated a lot of interest. (Thank you to everyone who joined us!) Jags Ramnarayan and I discussed problems with legacy RDBMS systems, NewSQL driving principles, SQLFire architecture, application design patterns as well as data consistency and reliability.
We went deep into vFabric SQLFire which is a pragmatic solution that addresses these data challenges:
How do I architect my data tier for very high concurrent workloads?
How do I achieve predictability both for data access response time and availability?
How do I distribute data efficiently and real time to multiple data centers (and to external clouds)?
How do I process these large quantities of data in an efficient manner to allow for better real-time decision-making?
Capacity planning in the enterprise is no easy task. In this post, we provide an overview for sizing VMware’s elastic, in-memory data management product, vFabric GemFire and a link to an in-depth, technical article.
Setting the Stage for Memory Sizing
Enterprise applications today are distributed systems that have to satisfy increasingly more complex business requirements. When the ever growing demand for managing more data is added, the task keeps getting harder.
One of the key factors in capacity planning for memory intensive systems, such as in- memory data stores, is memory capacity. Even though the price of memory keeps going down, data capacity requirements keep going up, and this makes memory as precious a resource as ever. As large systems become even larger, it becomes more important to manage this resource efficiently. In addition to obvious reasons, such as Total Cost of Ownership (TCO), there are technical challenges that come with large memory pools. For one, garbage collection (GC) takes more time, which can affect both latency and throughput. Determining memory requirements correctly is both crucial and difficult.
That is why this post and the related technical article focus on memory sizing and provide concrete guidelines for determining required memory for optimal performance, especially in large scale vFabric GemFire deployments. GemFire has facilities that can be very useful for memory sizing. The article not only explains the facilities, but also describes a method and guidelines to take the guesswork out of memory sizing process. Continue reading →
vFabric GemFire is a sophisticated product in a complex problem space: data management in distributed systems. In order to help our users get the most out of GemFire, we are starting a “cookbook” series, which will provide tried and tested recipes that we hope every GemFire user will find useful.
Our first topic is the Visual Statistics Display (VSD). VSD is a visual tool for analyzing GemFire statistics. It reads GemFire statistics from special statistics archive files created by GemFire, and renders their graphs for analysis. It is not a real-time online monitoring tool, such as vFabric Hyperic, so it does not have the real-time monitoring and alerting capabilities that they have. On the other hand, it is the most powerful tool for examining the state of a vFabric GemFire system, as it provides access to all the statistics collected by GemFire. No real-time monitoring tool can do that, as the amount of statistics that GemFire collects is prohibitive for real-time collection in a distributed system.
At some point, any data modernization project is going to require a load of legacy data. With an in-memory, distributed data store like SQLFire, customers often ask (like in this case) about load times because they can be sitting on 50-100 GB of data and don’t want to wait days. For those unfamiliar with NewSQL databases, this post should give you a good sense of how we loaded 8 million rows in 88 seconds. The test shows how we should be able to load roughly 40GB of data in about 1 hour.
For Java developers who want ideas about speeding up large volumes of calculations, transforms, or validations, you may want to consider a previous post, where SQLFire is used with Spring Batch.
With SQLFire, we take a multi-threaded load approach from a CSV file. Below, I’ve outlined 8 steps to the load strategy with an explanation of why things were done. Continue reading →