Hacker News new | ask | show | jobs
by seanharr11 3251 days ago
I am a programmer trying to learn math, so are my intended audience members.

That said, I should include facts related to convergence, and maybe even speed compared to SGD.

As to the reciprocal -> inverse generalization, do you have any resources you could point we towards to better understand this?

Additionally, a concrete answer to "Why would following the tangent repeatedly be a good idea?" has been hard to come by for me. I am able to visualize this, but if you have resources that explain this well please share.

1 comments

In general, it’s not a good idea. And in general, Newton’s method won’t converge.

Newton’s method boils down to replacing your function by a first-order approximation. For a differentiable function, in a small neighbourhood(!), that’s a good approximation (by definition), though, and the zero of the model function will be very close to the zero of the original function (if it lies in that neighbourhood).

PS: i did not expect the poster and author to be the same person, otherwise I would’ve phrased my criticism differently. A SHOW HN would have helped.

PPS: basically the whole reciprocal/inverse confusion only arises because you start the multidimensional case from your iteration formula. If you back to its derivation, and start again from there, you can avoid that.

> In general, it’s not a good idea. And in general, Newton’s method won’t converge.

Right, but this blog post isn't about the general case of using Newton's method to find roots, it's about using Newton's method for solving logistic regression for which it is perfectly suited, though there are better methods as well, of course.

Newton's method with a line search is the go-to algorithm for convex optimisation if the dimension of the problem is not too large.