| HN Mirror

I thought my ire at the term "data science" would have worn out by now, but it hasn't. To me it is a utterly meaningless term whose adoption in itself speaks volumes about the dynamics behind it.

As someone who has been doing "data science," including the programming, to me watching this trend has seemed mostly to be about hype and non-STEM-types, especially in business management and other similar areas, picking up on the importance of quantification.

I can think of two things that seem like legitimately very novel trends in my career in this area: deep learning, whose frameworks were largely abandoned in the preceding decades, and management of very large datasets. The first surprised me, the second I was talking about for years before it happened. The first seems so specialized to me, and to come after the "data science" trend, that the "data science" label seems unnecessary; the second is now usually discussed in terms of "data engineering" which I'm totally cool with.

There's a tendency to somehow suggest that the data science label is justified because statistics is all theoretical and not enough about real-world data, but that's always seemed to me to be a strawman that people erected to justify business hype labels to further their career. What it boils down to is playing off of business management's confusion that "statistics"=census numbers, counts, etc. It ignores the decades of computational statistics that was developing, and the fact that statistians are forced to deal with data as part of the field.

I wish I could find more of the papers I've read that illustrate the frustrations of statisticians and other scientists with data science. This will probably suffice, although there's more cogent, heartfelt examples: http://magazine.amstat.org/blog/2015/11/01/statnews2015/

It's difficult to describe, but for me personally it goes something like this: for years, you use R, C/C++/Python, Lisp, etc. to solve really difficult stats problems, are trying to be careful so as to not do something irresponsible. You've done work on supercomputers, laptops, you name it. Then, all of a sudden, there's an explosion of blogs, etc. talking about R, mahalanobis distances, and optimization routines as if they were discovered yesterday, by this brand new field of "data science" that's revolutionizing the world. All of a sudden because you don't know Cassandra or Spark, even though you're familiar with a lot of the underlying concepts because you've had to manage large datasets, and don't have a comp sci degree.

I don't mean ill will toward the practitioners, but it's difficult to convey what it's like to watch your field get repackaged and resold because of other peoples' misunderstandings about what it's about.