Hacker News new | ask | show | jobs
by JadeNB 3251 days ago
Since the author is reading, a few small typos, followed by one slightly more substantial comment: 'simgoid' should be 'sigmoid' (S-shaped); `x y = log(x) + log(y)` should be `log(x y) = log(x) + log(y)`;'guarentee' should be 'guarantee'; 'recipricol' should be 'reciprocal'.

I would like to see some mention of the fact that the division by the gradient is a meaningless, purely formal motivation for the correct step (inverting the Hessian) that follows.

3 comments

The motivation for the Hessian is the same as for dividing by the second derivative. Suppose we want to solve f(x) = 0. Taylor expand f(x) around the current iterate (a)

    f(x) =~ f(a) + f'(a)(x-a)
We want f(x) = 0 so,

    0 = f(a) + f'(a)(x-a)
Multiply both sides by the inverse of f'(x):

    0 = f'(a)^-1 f(a) + x-a
So:

    x = a - f'(a)^-1 f(a)
This is the update equation for Newton's method where a is the current iterate and x is the next iterate.

If f is a multi dimensional function f : R^n -> R^n then the derivative f'(a) is the Jacobian, and inversion becomes matrix inversion.

When we use Newton's method for minimisation of a function g we solve g'(x) = 0, so we pick f(x) = g'(x). Since the formula above contains f' we get a second derivative. The second derivative in multiple dimensions is the hessian.

I'd also like to note that the Hessian matrix elements should have a \partial{\partial{l}} in each numerator, not a single \partial{l} [1].

If you aren't using LaTeX for formatting, then think partial^2 of l. FWIW: I just found this[2] which would make (using the physics package) even simpler to represent.

[1] https://en.wikipedia.org/wiki/Hessian_matrix

[2] https://tex.stackexchange.com/questions/225523/how-to-write-...

Thanks for the feedback, I co-worker just jabbed me with regarding the log property mistake also...

As to the motivation for the correct step: can you point me to a resource that explains this? Not sure I follow...

> As to the motivation for the correct step: can you point me to a resource that explains this? Not sure I follow...

You write an equation involving division by the gradient. This is an illegal operation (one cannot divide by a vector), and your final recipe doesn't do it. As far as I can tell, you are writing down the incorrect, illegally-vector-inverting formula as motivation for the correct formula involving the (inverse of the) Hessian. All I am suggesting is that you say explicitly something like "Of course, this formula as written is not literally correct; one cannot actually divide by a vector. The correct procedure is explained below."

(Incidentally, speaking of inverses, another poster (https://news.ycombinator.com/item?id=14881265) has mentioned that it may be a bit confusing to speak of the inverse of a matrix rather than the reciprocal, since (as I interpret that other poster's point) the reciprocal of a matrix is just its inverse. I might prefer to say something like "We write $H_{\ell(\theta)}^{-1}\nabla\ell(\theta)$ rather than $\frac{\nabla\ell(\theta)}{H_\ell(\theta)}$ to emphasise that we are inverting a matrix, not a scalar, so that the order of multiplication matters.")

Ahh I got it. Understood, definitely worth clarifying, will update. Thanks.
Sorry to say it, but I got the impression that the author was unaware it was nonsensical, not that it was a clever motivation.
Bishop has a nice treatment of Newton's method in "Pattern recognition and machine learning". Good book to have on your shelf of you are learning this stuff.