Hacker News new | ask | show | jobs
by qqqwwweeerrr 1442 days ago
In my opinion (MSc math, PhD in applied statistics, currently postdoc at epidemiology department), the fundamentals are books like The Elements of Statistical Learning: https://hastie.su.domains/Papers/ESLII.pdf

I feel like the term data science is completely useless. Machine learning is an approach to "do AI" through statistics. Specifically, it is a branch of statistics where the sole focus is on prediction, compared to e.g. inference.

4 comments

The topics in that book are good typical "study material" but; 1) we can easily make anyone fail an interview by asking them to derive algorithms or formulas from there, and 2) all these topics predate deep learning, and everything that comes with it.

I agree these are good "filtering" topics, among many others.

The subject matter in the book is more general than focussing on a specific models; the book is about the statistical underpinnings of ML as a whole.
I find these definitions funny…. how statisticians vs computer scientists define machine learning differently. You get two different perspectives. Everyone wants to claim AI for themselves. I think the statisticians are pissed off that DNNs works so well.

Anyway least we forget: Neural networks came from cybernetics!

I never claimed that everyone wants to claim AI for themselves.

Artificial intelligence is ofcourse a scientific field on it's own right, even before machine learning was a thing. I'm just saying that AI scientists have used concepts from statistics to create an approach to AI called machine learning. I'm not saying that ML is a subset of statistics, mind you, but the statistical underpinnings of it definitely are. ML is not _just_ statistics too.

Moreover, why would statisticians be pissed about the efficacy of a model?

Firstly, many problems/questions that I work on are not concerned with prediction.

Secondly, even if I did, I would love to use DNNs. It's just that I never have a use for it considering I'm only looking at tabular data. Why bother with DNNs when, say, a random forest will do?

This is exactly how I tend to explain ML/ AI simply to people. Its mostly stats paired up with Linear Algebra, Calculus and Comp Science.
Thanks, will check that book out!