A bit of background
"Many [statisticians] have expressed concern that these and other data-oriented initiatives have been or are being conceived on your campuses without involvement of or input from the department of statistics or similar unit. I’ve been told of university administrators who have stated their perceptions that statistics is relevant only to “small data” and “traditional” “tools” for their analysis, while data science is focused on Big Data, Big Questions, and innovative new methods. I’ve also heard about presentations on data science efforts by campus and agency leaders in which the word “statistics” was not mentioned. On the flip side, I have heard from statistics faculty frustrated at the failure of their departments to engage proactively in such efforts."This concern is not new, though considering the author and source of publication, it has again risen in prominence in the minds of statisticians, prompting a renewed back-and-forth debate over whether or not statistics is data science and/or whether statistics should or should not engage with data science.
Interestingly, it has been my perception that this debate is largely relegated to statistics academia. Applied statisticians in industry tend to be very focused on their immediate objectives, much more likely to cross discipline boundaries to accomplish those objectives, and in general are more "data sciency" than their academic colleagues. And with a few exceptions, the data scientists that I know tend to hold statistics (and mathematics) knowledge as very fundamental to doing data science "right." Which is to say, they don't perceive much of a schism at all and I think many would argue that data science is making statistics more important, not less.
Nevertheless, times are changing and change requires adaptation.
But What are "Big Data" and "Data Science"?
Data science is a multidisciplinary field that involves the use and study of data for various purposes, and is actually close to Webster’s definition of statistics as “a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data.” However, data science’s roots are largely in the computer science field and are by no means limited to numerical data.
There have been numerous attempts to better define data science. A popular Venn diagram produced in 2010 depicted data science as the intersection of “math and statistics knowledge”, “hacking skills”, and “substantive expertise”. A more recent update by another blogger, pictured below, contends that data science is the union of these skills and possibly more. And while there has been much discussion of the elusive “data scientist” who makes hundreds of thousands of dollars per year, a consensus has been forming more recently that data science is best performed by teams of experts from each of the involved disciplines.