|
|
|
|
|
by perturbation
3115 days ago
|
|
I've been seeing nothing but negative, dismissive comments about data science on HN lately, which is really disappointing. There's definitely a lot of hype right now about DL, but almost all of my job does not deal with Big Data or Deep Learning, 'just' machine learning + stats + calc + scripting + data cleaning + deploying models. I think most people don't have big data (Amazon has an x1 with 4 TB of RAM, after all!) but there's no shame in that.
I'll use a big machine for grid search or other embarrassingly parallelizable stuff, but I can confirm that Spark is usually a bad tool for actual ML unless you use one of their out-of-the-box algos. Even then, tuning the cluster on EMR with YARN is a pain, especially for pyspark. There's a gap, I think, between the inflated expectations of "I'm going to get general AI in 5 years and CHANGE THE WORLD" and "this K-means clustering will be a good way to explore our reviews", but somewhere in the middle there is actual value. (I also hate that "AI" is becoming the new hype-train; I don't consider anything of what I do to be "AI", but you have people calling CNNs or even non-deep-learning models "AI"). This is only going to result in inflated expectations- DS practitioners have to communicate the value without hype, and also find a way to weed out charlatans. |
|
I think much of the negativity towards DS from the programming community is because the Data Scientist is what the programmer used to be ~15 years ago. It's that nerdy thing for a select group of very smart people, whereas being a software developer/engineer/architect/whatever has become just another common job (at least outside of Silicon Valley).
Also, from my experience as the lone developer taking the first steps to implement machine learning techniques in my company - lots of developers also think DS/ML is a cool thing with value, but they simply, absolutely don't understand it (and don't want to put in the effort to learn). These techniques are not hard and not magic, but they require a completely different way to think about problems than "traditional" programming does. I've seen developers up and down the hierarchical ladder struggle with wrapping their heads around these concepts, and it's way easier to dismiss it all as "hype" instead of accepting the fact that these techniques will be a huge part of what software development will look like in the future.