The era of data silos is nearing its end. The ongoing cycle of data science, and the rapid development of applications built upon those models and insights, will not wait for an IT infrastructure that stores critical information in numerous disconnected locations. The speed and scalability of Hadoop has given rise to the concept of the data lake, which is key to Pivotal’s vision of a unified PaaS. In an article at Forbes, Edd Dumbill characterizes the data lake as “a dream” given the current enterprise climate, but one that remains “an accessible dream.”
In his article, Dumbill offers a succinct and useful definition of the data lake concept:
“The data lake dream is of a place with data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment. Applications are no longer islands, and exist within the data cloud, taking advantage of high bandwidth access to data and scalable computing resource. Data itself is no longer restrained by initial schema decisions, and can be exploited more freely by the enterprise.”
The move from data silos to data lakes will accelerate data-driven insights, app development, iteration, and time to value. But this transition doesn’t happen overnight. Dumbill views this as being a four-part process for an enterprise.
When Hadoop first enters the picture, it primarily serves as an input, with disparate applications and sources contributing data for analysis. Over time, as more data sources are integrated into a growing Hadoop system, this changes into an ongoing cycle of input and output, wherein data drives insight which produces data-aware apps, which in turn contribute back to the growing wealth of information.
The data lake’s opportunities and impacts are well-documented on this blog. It is set to transform corporate IT and security operations, require closer collaboration between data scientists and app developers, spur competition and innovation, and drive new value opportunities.
As Dumbill states in his article, many enterprises remain in the early stages of this transition, but that is quickly changing. Noting that consumer giants such as Google and Facebook already boast these capabilities, enterprises have an imperative to catch up.
“As business is increasingly digital, access to data will become a critical priority,” Dumbill writes, “As will speed of development and deployment. The data lake is a dream that can match those demands.” Providing the knowledge and infrastructure necessary to meet this challenge and enable the “consumer-grade enterprise” is fundamental to the Pivotal One vision.
Learn more about Pivotal and the Data Lake
- Capgemini Is Co-Innovating with Pivotal to Provide the Business Data Lake. Find out how.
- Pivotal One is the World’s First Comprehensive Multi-Cloud Enterprise PaaS.
- Want to start fishing in your own data lake? Contact the new Pivotal and Capgemini CoE.