Hacker News new | ask | show | jobs
by eanzenberg 3244 days ago
Wat? I don't think any DS is claiming to be better than any programmer and statistician. I think the anecdote you refer to is, a DS is better at programming than a statistician and is better at statistics than a programmer. This viewpoint holds up in my experience.
1 comments

I thought my ire at the term "data science" would have worn out by now, but it hasn't. To me it is a utterly meaningless term whose adoption in itself speaks volumes about the dynamics behind it.

As someone who has been doing "data science," including the programming, to me watching this trend has seemed mostly to be about hype and non-STEM-types, especially in business management and other similar areas, picking up on the importance of quantification.

I can think of two things that seem like legitimately very novel trends in my career in this area: deep learning, whose frameworks were largely abandoned in the preceding decades, and management of very large datasets. The first surprised me, the second I was talking about for years before it happened. The first seems so specialized to me, and to come after the "data science" trend, that the "data science" label seems unnecessary; the second is now usually discussed in terms of "data engineering" which I'm totally cool with.

There's a tendency to somehow suggest that the data science label is justified because statistics is all theoretical and not enough about real-world data, but that's always seemed to me to be a strawman that people erected to justify business hype labels to further their career. What it boils down to is playing off of business management's confusion that "statistics"=census numbers, counts, etc. It ignores the decades of computational statistics that was developing, and the fact that statistians are forced to deal with data as part of the field.

I wish I could find more of the papers I've read that illustrate the frustrations of statisticians and other scientists with data science. This will probably suffice, although there's more cogent, heartfelt examples: http://magazine.amstat.org/blog/2015/11/01/statnews2015/

It's difficult to describe, but for me personally it goes something like this: for years, you use R, C/C++/Python, Lisp, etc. to solve really difficult stats problems, are trying to be careful so as to not do something irresponsible. You've done work on supercomputers, laptops, you name it. Then, all of a sudden, there's an explosion of blogs, etc. talking about R, mahalanobis distances, and optimization routines as if they were discovered yesterday, by this brand new field of "data science" that's revolutionizing the world. All of a sudden because you don't know Cassandra or Spark, even though you're familiar with a lot of the underlying concepts because you've had to manage large datasets, and don't have a comp sci degree.

I don't mean ill will toward the practitioners, but it's difficult to convey what it's like to watch your field get repackaged and resold because of other peoples' misunderstandings about what it's about.

That's fine. How do you describe a software engineer? Someone who codes? Makes APIs and tools? Handles security? Handles servers? Implements UI/UX?

So do you equally think labelling of software engineer is meaningless because it's broad?

Data science envelops many many many different sub-fields and specializations, many of them not involving any science at all, but some of them do involve science (understanding structure through observation and experimentation).

Maybe you don't like us being called "Scientists"? I can go to a journal, read research articles, and point out ones with horrible statistical analysis. Are those authors more of a scientist than I am, because they are arbitrarily in "academia"?

Finally, a dirty little secret is that the more data you have the less statistics you need. I bet even Google knows this, and their data dept. is probably the best academic statisics dept. I've ever met.