For growth initiatives, many companies are looking to innovate by ramping analytical, mobile, social, big data, and cloud initiatives. For example, GE is one growth-oriented company and just announced heavy investment in the Industrial Internet with GoPivotal. One area of concern to many well-established businesses is what to do with their mainframe powered applications. Mainframes are expensive to run, but the applications that run off of them are typically very important and the business can not afford to risk downtime or any degradation in service. So, until now the idea of modernizing a mainframe application has often faced major roadblocks.
There are ways to preserve the mainframe and improve application performance, reliability and even usability. As one of the world’s largest banks sees, big, fast data grids can provide an incremental approach to mainframe modernization and reduce risk, lower operational costs, increase data processing performance, and provide innovative analytics capabilities for the business—all based on the same types of cloud computing technologies that power internet powerhouses and financial trading markets.
The Fast Data component
One customer used used vFabric GemFire to save $75 million dollars on a mainframe modernization project, and it has been used for the past ten years as highly performant, horizontally scalable data transaction layers, or big data grid, for mission-critical applications. Both GemFire and it’s sister product, SQLFire, are known to achieve linear scale. Key use cases include credit-card transaction systems, stock trading platforms, foreign exchange systems, web-based travel reservation systems, and mainframe batch data offloading. As an in-memory data grid, its main advantages are being able to do in-memory, sub-millisecond transactions while still maintaining the highest standards for fault-tolerance, high-availability, and linear scalability on a distributed platform.
The Big Data component
With big data analysis, Greenplum has a multitude of customer case studies with companies like O’Reilly Media, Skype, and NYSE Euronext. These solutions have become well-known when it comes to analytical analysis on multiple terabyte or petabyte data sets where traditional relational databases begin to break down, stop scaling, or fail to deal well with un-structured data. Greenplum technology provides a complete big data solution for both structured and unstructured data, based on the Greenplum Database and Pivotal HD—a commercially supported distribution of Hadoop that includes HDFS, MapReduce, Hive, Pig, HBase, Zookeeper, Sqoop, and Flume. The recently announced Pivotal Advanced Database Services powered by HAWQ allow for SQL queries to run on the fastest Hadoop-based query interface on the market today—a 100X+ faster solution.
Fast Data + Big Data: Better together
Big data and fast data solutions make a lot of sense together as we’ve seen on many customer solution blueprints delivered over the past several months. This is because most business owners and administrators aren’t able to fully utilize the data being captured in their transactional systems on a daily basis. From a business value perspective, the fast data layer can bring scalability and reliability to the business while reducing the cost per transaction. Most transactional systems also benefit from predictive analytics on transacted data, and the fast data layer enables this type of real-time transaction analysis that can also incorporate big data result-sets. The big data layer provides insight on mountains of data to help with decision making and support traditional performance metrics or enable more advanced types of visualization and data science.
From Mainframe to Big Fast Data Architecture
Moving from mainframe to big, fast data is an evolution. A phased approach—step by step—is certainly the most recommended way of modernizing applications. It makes sense because it minimizes risk and better justifies investments. After working with many customers who face this problem, here is one approach we recommend.
1. Selecting the Pilot: Pick a Starting Point
As with most major initiatives, an initial use case or small scope should be used as a pilot to validate the architecture choices and prove a return for the overall project. The ideal project candidate should a) have little or no integration points with other systems on the legacy platform, b) be small but critical to existing business processes, c) consume a considerable amount of operational expenses, and/or d) represent a business risk in its current state. By screening this way, we should be able to deliver something of value to the business, reduce OpEx, and make the improvement quickly while avoiding bad decisions.
2. Designing the Modern Data Architecture for Co-Existence
The goal of this step is determine what legacy data stays, migrates, or integrates. First, there is an analysis on the pilot’s data model. Then, we begin to design a data architecture that makes sense for a highly scalable, distributed data grid and still supports the existing business model and processes. The analysis should identify which entities are transactional, a mix of transactional and analytical (e.g. part of a real-time analytics model), or purely analytical. During this process, we make decisions regarding data model partitioning, replication, colocation, disaster recovery, transaction consistency, and more. We also decide which data to leave on the legacy platform, accessing it on the fly as needed using the GemFire integration layer capabilities.
3. Integrating Mainframe and Big Data Grid
While the data architecture is being defined, we start building the initial big fast data infrastructure. Then, the pilot migrates the first use case to the modernized architecture. By using the GemFire/SQLFire asynchronous integration layer, we can provide data consistency between the new application and legacy mainframe application. Transactions done on the modernized system are delivered simultaneously to both the legacy system and the analytics platform. Integration with legacy can be achieved using either a mainframe connector, CICS Web Services, messaging platform, or any other integration protocol.
4. First Deployment Risk Mitigation Plans
When the pilot is complete and has proven to be better performing than the legacy system with much lower maintenance costs, we are ready to partially turn off our first piece of the legacy system. The legacy system, especially if living on a mainframe, should stay there for a period of time to support ongoing business. During this time, new transactions should start happening on the new system and data can be validated against the original system to make sure it is behaving exactly as expected. This will minimize risk, assure a seamless architecture evolution, and avoid headaches from unexpected problems. While the deployment acts as an advanced, operational cache for the mainframe, the mainframe still receives the data it needs while both analytical and real-time or predictive analytics data stores are updated.
Step-by-step, other applications or portions of the mainframe can be carefully migrated to the new platform in a similar manner—without risk. As this happens, we gradually reduce mainframe usage, costs, and time to market for new deployments. We gain a level of scalability proven by data grids that run the most rigorous and high-performance data environments on the planet—those that power financial transactions. We also enable new methods of analysis to unleash business insight and value.
Of course, there is an initial capital expense to make; however, the investment is justified by reduced operational expenses. Companies can also save on capex by leveraging existing, partially used infrastructure since the software runs on commodity hardware.