| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by michaelochurch 4217 days ago

The only reason I listed machine learning is that it is the trogan horse that gets programmers to learning stats.

Ah, that makes perfect sense.

To me, the appeal of machine learning is that it's challenging and respected enough that programmers doing it get the autonomy to work in any subfield of computer science, from the high level to the low. If you're a data scientist and you say you want to use Clojure or Haskell (a high-level concern) or that you want to do GPU programming or dive deep on assembly (low-level work) you can. Machine learning, 10 years ago, was extremely appealing because software managers were figuring out that they needed it, but most admitted they knew little about it, so they gave a lot of autonomy to individual contributors. (That may change, and "data science" may become thoroughly commoditized.)

It's the Fundamental Theorem of Employment: you're usually hired either to do (a) something your boss can't do for himself or (b) something he doesn't want to do. With (a) you get respect and autonomy and high pay; with (b) you get treated like a commodity. "Data scientist" (or software "architect" vs. "engineer") is, often, a programmer who's managed to learn enough of "the hard stuff" to move himself over to (a). It's the (b) category of engineers who get stuck on "Scrum teams".

I feel like some of the crowding of "data science" (and, as you noted, not all of the "data scientists" know what they're talking about) comes from the way that "Agile"-style micromanagement has made the rest of programming so braindead. There are people like me who enjoy the hard mathematical aspect, but others who've just learned that if they call themselves "data scientists" they get more interesting projects and don't have to munch on Scrum tickets. For them, the math is an impediment rather than a challenge and an attraction.

I mentioned the deep dive into the machine learning techniques as unfortunately most of the programmers I meet who call themselves data scientists, just aren't very good at stats.

There's a depth vs. breadth problem, because machine learning is a much, much bigger field than many people think. I've gone pretty deep on penalized regressions (e.g. ridge and Lasso with large numbers of features) but know only the basics about tree-based models. I can read the papers on neural network architectures (e.g. convolutional nets) and implement them, and I understand the theory that led to them, but I still lack some of the intuition (like why rectified linear units are more useful in image processing than regular logistics).

I feel like there are some people who pick up a lot of vocabulary and interview well (but not with you, because you actually know the field) but are really just playing around with parameters. I like the math, but "data science" is mostly bullshit and I hope the term will die; I want to see more machine learning and less business pomposity.