| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bertil 3443 days ago

Actually, I’ve noticed a meaningful distinction between people who learned statistics from machine learning (and are more likely to call each other data scientist) and statisticians (the least experimental of whom used to go by the title analyst): what to do when there is either too little, or too noisy data. Interestingly, those two are happy to be called Data scientist, but in my experience, they rarely meet.

A traditionally trained statistician would evoke negative result and decide not to use the model and support to maintain the pre-existing approach. A machine learning expert might not care, apply the coefficient out of the model as is because they are presumably closer than a guess and is more likely to be openly skeptical of human expertise.

That has lead to some frustrating situation for me: me arguing we should censor things like negative speeds, while I was told that there was no problem because the results were regularised anyway. Building and picking proper factors to use in regression is something that you can partially get away with when having larger databases, and back-propagation can take over; before that, insights still do matter.

I have not meet many who can articulate that transition effectively.

It seems that you’ve met mostly the second category; they are possibly the larger group, but not necessarily the most influential. There is a core of people who are meaningfully different. The linked article seems to be from someone in between but closer to the second group.