| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by michaelochurch 4463 days ago

"Data scientist" is a mess of a job title. It seems to be as much of a reaction against the commoditization of software engineering (which leaves the smartest, and by correlation, usually the most mathematically literate, 10% of programmers ill-suited for the average software job) as it is a real distinction.

There are plenty of "data scientists" who use canned tools and play around with parameters because that's all "the business" thinks it needs.

You want to trim complexity for a reason that any data scientist worth his salt (and there are plenty of celebrity engineers in SF making $500k who aren't worth their salt and don't know this) should already know: bias-variance tradeoff (see also: underfitting and overfitting). If your model is too flexible/complex, it will begin absorbing noise. That leads to a model that performs extremely well on training data but fails miserably on unseen data. There are well-studied techniques for preventing this, but I'd guess that fewer than 20% of self-described or titled "data scientists" are familiar with them.

1 comments

eshvk 4463 days ago

> There are plenty of "data scientists" who use canned tools and play around with parameters because that's all "the business" thinks it needs.

As with a software engineer, it is a role that is different in every place. Every place has its own definition of the role. This is not bad. It is a mere reflection of the market conditions where there are a lot of people are simultaneously bad at Linear Algebra, Probability and Statistics and dangerous enough to write production code fast. (Your standard C.S. grad SWE).

link