The origin story of data science
Today, data science is defined as a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. It emerged thanks to the convergence of a wide range of factors: New ideas among academic statisticians, the spread of computer science across various fields, and a favorable economic context.
As the falling cost of hard drives allowed companies and governments to store more and more data, the need to find new ways to value it arose. This boosted the development of new systems, algorithms, and computing paradigms. Since data science was particularly appropriate for those wanting to learn from big data, and thanks to the emergence of cloud computing, it spread quickly across various fields.
It ought to be noted, though, that while the rising popularity of big data was a factor in the rapid growth of data science, it shouldn’t be inferred that data science only applies to big data.
Along the way to becoming the field that we know now, data science received a lot of criticism from academics and journalists who saw no distinction between it and statistics, especially during the period 2010–2015. The difference may not have been obvious to them without a statistician’s background. Here, we examine the origins of this field to get a better understanding of why it is a distinct academic discipline. And since it’s a story better understood when looked at through the individuals involved in creating it, let’s meet the four people who pushed the boundaries of statistics: John Tukey, John Chambers, Leo Breiman, and Bill Cleveland.