Hacker News new | ask | show | jobs
by PartiallyTyped 1261 days ago
I recommend against DL by Goodfellow. At this point it is pretty much outdated. Actually, anything specific to NNs is already outdated by release.

You'd need the following background:

- Linear Algebra

- Multivariate Calculus

- Probability theory && Statistics

Then you need a decent ML book to get the foundations of ML, you can't go wrong with either of these:

- Bishop's Pattern Recognition

- Murphy's Probabilistic ML

- Elements of statistical learning

- Learning from data

You can supplement Murphy's with the advanced book. Elements is a pretty tough book, consider going through "Introduction to statistical learning"[1]. Bishop and Murphy include foundational topics in mathematics.

LfD is a great introductory book and covers one of the most important aspects of ML, that is, model complexity and families of models. It can be supplemented with any of the other books.

I'd also recommend doing some abstract algebra, but it's not a prerequisite.

If you would like a top-down approach, I recommend getting the book "Mathematics of Machine Learning" and learning as needed.

For NN methods, some recommendations:

- https://paperswithcode.com/methods/category/regularization

- https://paperswithcode.com/methods/category/stochastic-optim...

- https://paperswithcode.com/methods/category/attention-mechan...

- https://paperswithcode.com/paper/auto-encoding-variational-b...

For something a little bit different but worth reading given that you have the prerequisite mathematical maturity

- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges | https://arxiv.org/abs/2104.13478

[1] https://www.statlearning.com/

Many thanks to the user "mindcrime" for catching my error with Introduction to statistical learning.

3 comments

consider going through "Introductions to Elements of statistical learning"

Was that supposed to be An Introduction to Statistical Learning[1] or maybe Introduction to Statistical Relational Learning[2]? I don't think there is a book titled Introduction to Elements of Statistical Learning?

[1]: https://www.statlearning.com/

[2]: https://www.cs.umd.edu/srl-book/

I referred to [1], thanks I have corrected GP.
(I can't wait until the myth that you need linear algebra and calculus to do ML finally dies. It's like saying that you need to understand assembly to do programming. It helps, but it's far from a requirement.)
I disagree strongly. In your analogy, if the compiler broke down all the time, you would probably need to understand assembly to do programming. ML is amazing today, but still kinda sucks. In general you’ll have a bunch of failures on the way to a successful novel application, so it’s more critical to understand what’s going on under the hood in ML than in your programming analogy.

If you just want to apply well known things to well known things, sure you’re right. But as soon as things go wrong, I couldn’t imagine how much more inefficient my iteration cycles would be trying to do novel work without understanding linear algebra (for some kinds of novel work) or calc (for other kinds of novel work). I think you kinda get at this when you say it’s not necessary but it helps. It’s not necessary, but it helps a lot with anything off the beaten track.

We agree, I think!

And certainly, if you're one of those people who can pull it off, studying ML from first principles is probably an advantage. I just wince every time since I wouldn't have gotten into ML in the first place if I had to start with a big Calculus tome. There are probably a lot of people like me out there.

OP asked for foundational, and I provided _foundational_. In my opinion, everyone should start from some sound foundations in LinAlg and Calculus.

Here are a couple of errors that stem from a single foundational problem:

- a linear regressor can not be more than the number of datapoints

- dimensionality reduction when you have NxM with M > N is bogus and you need a bigger dataset to do anything meaningful other than clustering

- input dimension of output layer is larger than the number of samples

The underlying issue in all of these is the rank nullity theorem which is pretty foundational for ML, and yet many practitioners don't know about it or haven't made the connection.

I am not expressing that you should have gone through Spivak or build bottom up. There are books like mathematics of ML that condense everything you need, giving you a decent enough foundation for what you will need.

Correction:

A linear regressor can not have more parameters than the number of data points.

> I can't wait until the myth that you need linear algebra and calculus to do ML finally dies.

This is such a dangerously absurd claim.. but then, it speaks volumes about the abysmal state the non-research heavy AI/ML field has fallen into.

As always on HN, the right answer is at the bottom.