Data science is a young discipline, a multidisciplinary field requiring knowledge in sophisticated statistical modeling and software engineering. A strong grasp of information design doesn’t hurt, either. As a result, skilled practitioners are in high demand as increasingly data-driven enterprises and organizations in need of a unique skillset capable of reaping insights from big data. Meanwhile, there remains some confusion and debate as to what makes a data scientist.
The future of the discipline is bright, but it’s useful to look to its past to understand what it is and where it may be going. Data science arose from the convergence of two more mature disciplines. In a new post at Forbes, Gil Press presents a short history of how the discipline came to be, tracing its evolution back to a 1962 paper by mathematician John W. Tukey, “The Future of Data Analysis“. In Peter Naur’s 1974 book Concise Survey of Computer Methods, the computer scientist offered an early definition of data science, as “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”
Beginning in the mid-’90s, the discussion leapt out of academic circles and turned towards potential business applications, with the advent of data mining technologies and their potential application in marketing and business intelligence. These developments also prompted the now-familiar challenge of storing and working with millions of rows of data. In 1999, Jacob Zahavi articulated this emerging issue, stating, “Scalability is a huge issue in data mining. Another technical challenge is developing models that can do a better job analyzing data, detecting non-linear relationships and interaction between elements… Special data mining tools may have to be developed to address web-site decisions.”
Data science came into its own during the last decade. As the strands of mathematics and computer science continued to intertwine in academia, new technologies were developed to mine, store, and analyze these massive data sets, while consumer internet giants such as Google demonstrated the business value of a data-driven approach to operations and innovation. A 2009 prediction by Google’s Chief Economist Hal Varian was particularly spot-on, with Varian telling McKinsey Quarterly, “I keep saying the sexy job in the next ten years will be statisticians…The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades.”
Four years later, this statement seems like a forgone conclusion, as big data has reached buzzword status in the media, and become fundamental to the operations of enterprise, academic, and government organizations. Awareness of the value of data science has leapt out of academia and the business world and into mass culture, largely thanks to the accuracy of Nate Silver’s projections during the 2012 elections and his bestselling book The Signal and the Noise. The discipline’s prominence and impact is set to increase considerably in the next decade, with the advent of the Internet of Things, the industrial internet, and the democratization of its tools and techniques, which will transform fields from healthcare to agriculture, journalism to civic life.
To learn more about the history of data science and its rise to prominence, check out Gil Press’s Short History of Data Science at Forbes.