| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zwaps 1983 days ago

I mean you have to think about cause and effect here. DS will contract because many/some data scientists simply aren't good enough, and most DS just doesn't do what it is supposed too.

First, like you said, there are the stray PhDs who do it since they know research and some statistical applications. Second, there are hordes and hordes of DS people who "learned" their skill with some bootcamps or online courses, which means they know enough to write notebooks and glue together functions. Their understanding of theory is often shallow. In either case, it is hard to "blame" someone for taking an attractive job. But it isn't good for the discipline.

The appeal of DS is clear for companies. But the problems it promises to solve are much more complex than we collectively recognize - or are willing to admit. In my opinion, doing causal inference is a difficult, unsolved, and deep topic and no single course would equip to you to tackle it. It takes domain knowledge and multiple years of stats/math/ML (all of them, not one of them). And yet, causal inference is what 90% of people want ML to be. A model that works on some dataset is not a model that is useful in light of the true latent DGP. Yet, when we want to sell T-Shirts, what do we really want?

Hence, when I look at the problems that ML is supposed to solve, I think that most people calling themselves DS on linkedin are not really equipped for it. And there is a case to be made that some fields where PhD researchers train to solve such causal inquiries indeed are better equipped to tackle the issue.

For example, if it's about selling shirts, I would take an econometrician with some data engineering skills over a coursera superstar any day of the week. I think if you do a PhD in ML/Stats/Biostats/Econometrics/etc., it is reasonable to pursue a career in DS. It's what statistics _is_ now.

If you have some other PhD and know some Anova, OLS and Stata - or if you have CS background but know some Jupyter and Keras - then it's essentially career change. It might work, but probably not without a hitch.

So I agree with you, but I'd reframe it: It's unclear to me whether we need a contraction, or whether we instead need a quality update.

I disagree with you in one point: I do not think we will make progress in DS (getting it to work in more use cases) by treating it like a solved problem, a skill like milling that needs talent and experience, but not academic education. If we do that, I think DS will contract because it will stagnate in usefulness.

My point here is not to accuse anyone of being a bad DS. I am sure there are many ways to become efficient. But even the theory of causal inference with simple linear models goes far, far beyond what I saw in ML hiring tests, online courses and so forth. And solving the problems it tackles is not accomplished by throwing more layers at it. For other ML algos, we aren't even close yet at understanding these issues on a similar level.

In the end, what we need are actual ML scientists. They should neither be pure statisticians, nor pure subject-matter experts, nor pure computer scientists - as we mostly have now. We also need more than the current ML programs that are mostly clobbered together from other areas. For example, people who publish in ML research are probably very useful in a company that has to deal with that exact problem. Any scientist knows, of course, that even a fairly adjacent question may already require tons of different knowledge. DS is, will remain, and probably should be an academic field, because there are more open than solved problems right now.