Hacker News new | ask | show | jobs
by timerol 2359 days ago
I always find it interesting to see the different paths people take toward learning the same thing. When I first did multivariable calculus, I learned that the gradient points uphill, and the negative of the gradient points downhill. I'm definitely a spacial learner, and mostly thought of surfaces the way one walks over hills. The idea of using gradient descent to find a local minimum is the simplest part of neural networks to me.

It's interesting to see someone first write an article about nearest neighbor classifiers (a topic I really don't know much about), and then, 2-3 months later, figure out why we use gradient descent.

2 comments

Yeah thats what the gradient literally is defined as = Rate of change. Not trying to be snarky but for me this means not taking the time to learn the basics before jumping into far far advanced concepts. Sadly this seems to be a pattern in a lot of machine learning curriculum today.
When private schools offer a 1-year course to become a "Machine Learning Consultant" with no prior mathematics or programming knowledge required, you know that something has to be off ...
I mean, a quality program will include both numerical analysis and optimization methods classes in their program, both which will (hopefully) go through things like gradient descent in rigour.

And long before that students will hopefully have a solid fundamental knowledge in what rate of change means - not just as in banging out derivatives on paper.

Thank you for taking the time to look through the post. When I learned this formally it was introduced as steepest descent (Cauchy's variant). Like yourself, it didn't become concrete to me until I looked at the surface plots. I concur with your point that it's interesting to see the different paths people take toward learning the same thing.
I too, like commenter above, find it fascinating how various concepts are quickly accessible (or not!) for different people. Usually I blame it on the teacher's presentation (for my part, I am completely unable to explain pointers in (say, C) to people who haven't already been able to get it), but sometimes it is also the building blocks that the learner already has in place.

So - maybe a rude question - but had you take Calculus, Differential Equations or Vector Calculus (div, grad and curl and all that) before you started in on this more integrated material?

Not rude at all! Thanks for the question. My background is in Statistics not Machine learning (that came later) so I covered these topics without reference to ML applications. I suppose I never learned to connect these ideas when I was learning about ML, it was only when I went back to my old material I realised how these ideas relate. I agree with your point about teacher material, when I was learning about ML it was separated from raw Calculus.

A short answer to your question - yes I took those courses before I started looking at the more integrated material.