I'd say that (1) is changing, if slowly. But (2) is a good message for non-statistical folks in the data science community. Statistics is a field that is both wide and deep. There are many pressing data science problems that have been addressed in some fashion by someone in the statistics community. In many cases, we don't need to reinvent the wheel.
- Given the importance of statistical thinking why aren't statisticians involved in these initiatives?
- When thinking about the big data era, what are some statistical ideas we've already figured out?
One area that I see as being quite underdeveloped in data science is how it approaches time series data. For that, we should look to the econometricians as much as statisticians. (What's the difference between an econometrician and a statistician? About $15 K a year. Boom.) I am a fan of David Hendry's approach and I think the data science community would like it as well. He calls it "general to specific" modeling and I've seen a similar approach used to build machine-learning models.
Oh, but I'm off topic...
Anyway, the title of Leek's post "Why big data is in trouble: they forgot about applied statistics" is a bit melodramatic. Big data isn't in trouble because big data isn't going anywhere. (By big data, I mean the concept of a data-driven world.) As I said in an earlier post,
It may be tempting to see [Google Flu] as justification that big data/data science is just media buzz. However, the technology that makes acquiring these data easy is here to stay. Reconciling statistical best-practices and big data is actively being discussed in the data science and big data communities.
I look forward to the day that "data science" applications become mainstream in the statistics community. Then we'll really be cookin' with Crisco!