Hacker News new | ask | show | jobs
by perturbation 3115 days ago
I've been seeing nothing but negative, dismissive comments about data science on HN lately, which is really disappointing. There's definitely a lot of hype right now about DL, but almost all of my job does not deal with Big Data or Deep Learning, 'just' machine learning + stats + calc + scripting + data cleaning + deploying models.

I think most people don't have big data (Amazon has an x1 with 4 TB of RAM, after all!) but there's no shame in that. I'll use a big machine for grid search or other embarrassingly parallelizable stuff, but I can confirm that Spark is usually a bad tool for actual ML unless you use one of their out-of-the-box algos. Even then, tuning the cluster on EMR with YARN is a pain, especially for pyspark. There's a gap, I think, between the inflated expectations of "I'm going to get general AI in 5 years and CHANGE THE WORLD" and "this K-means clustering will be a good way to explore our reviews", but somewhere in the middle there is actual value.

(I also hate that "AI" is becoming the new hype-train; I don't consider anything of what I do to be "AI", but you have people calling CNNs or even non-deep-learning models "AI"). This is only going to result in inflated expectations- DS practitioners have to communicate the value without hype, and also find a way to weed out charlatans.

3 comments

It's silly you're getting downvoted for this well-articulated and insightful comment.

I think much of the negativity towards DS from the programming community is because the Data Scientist is what the programmer used to be ~15 years ago. It's that nerdy thing for a select group of very smart people, whereas being a software developer/engineer/architect/whatever has become just another common job (at least outside of Silicon Valley).

Also, from my experience as the lone developer taking the first steps to implement machine learning techniques in my company - lots of developers also think DS/ML is a cool thing with value, but they simply, absolutely don't understand it (and don't want to put in the effort to learn). These techniques are not hard and not magic, but they require a completely different way to think about problems than "traditional" programming does. I've seen developers up and down the hierarchical ladder struggle with wrapping their heads around these concepts, and it's way easier to dismiss it all as "hype" instead of accepting the fact that these techniques will be a huge part of what software development will look like in the future.

I've been seeing nothing but negative, dismissive comments about data science on HN lately, which is really disappointing. There's definitely a lot of hype right now about DL, but almost all of my job does not deal with Big Data or Deep Learning, 'just' machine learning + stats + calc + scripting + data cleaning + deploying models.

But, all those things people did in the '90's or even earlier. It was called "data warehousing" or "decision support" back then. The fundamental techniques - linear regression, logistic regression, k-mean clustering - go back even earlier, to the OR community post-WW2. Banks have been doing credit scoring with these techniques for a loooong time. The manufacturing industry has been using these techniques for even longer. Engineering for even longer than that.

So you can see why people are quite cynical about the way old, established techniques are being presented as the hot new thing - and you can see why people who have been doing this stuff for 20+ years might be annoyed at 20-somethings who claim to have invented this new thing. What's wrong with someone calling themselves a "statistician" or an "applied mathematician"?

But this is by no means purely a DS thing, seems noone is a programmer anymore either, they're all "senior certified enterprise solution architects" or some grandiose thing.

> But, all those things people did in the '90's or even earlier. It was called "data warehousing" or "decision support" back then.

I would say data warehousing is more concerned with things like OLAP, Star Schema, ETL, etc. than what people are calling 'data science' right now. The same thing with 'decision support', since data warehousing grew out of decision support systems. The most overlap here is with 'data mining' algorithms like association rules clustering.

> The fundamental techniques - linear regression, logistic regression, k-mean clustering - go back even earlier, to the OR community post-WW2.

Here I think you've got a stronger argument. OR has a long, proud history of using applied math for business objectives. But again, I would say most of OR deals with different problems and different techniques - it's more about prescriptive analytics, constrained optimization, linear programming, simulations, etc. than the type of predictive modeling in most data science.

I see data science as a separate field even though it's stitched together from a bunch of others. It's certainly not entirely new, and certainly overhyped in some annoyingly-breathless news reports. I could say the same thing about CS - was it entirely "new" when it started as a discipline? Isn't CS "just" applied math?

> seems noone is a programmer anymore either, they're all "senior certified enterprise solution architects"

To be fair, few of the "senior architects" I've worked with in big companies knew how to program very well.

I think their hype got even you a little bit. That is revealed by the word "even" in the phrase: 'people calling CNNs or even non-deep-learning models "AI"'...
What I mean by this is - I don't see how anyone could reasonably call a Random Forest "AI" with a straight face, whereas someone could (wrongly, but understandably) call a CNN / RNN / etc. AI if only because it has the word "neural" in it.

There's two groups:

- People who are overly enthusiastic about neural nets

- People who are cynically calling every ML algorithm "AI", up to and including linear regression

and I'm more annoyed at the last one.

To anyone non-technical, a decision is AI. 99% of the world is non-technical so its probably only going to continue to be this way.