Hacker News new | ask | show | jobs
by antoine-levitt 625 days ago
I just skimmed the article and didn't watch the video, but the bit about backpropagation is just wrong. Backpropagation doesn't compute an inverse of the jacobian, it computes its transpose. (although a similar idea to backpropagation could possibly be used to compute an inverse of several reversible layers, but that's not typically how neural networks work)
4 comments

Backprop itself doesn't invert the computation, but it does give you the direction for an incremental move towards the inverse (a 'nudge' as the article puts it). That is, given a sufficiently nice function f and an appropriate loss ||f(x) - y*||^2, gradient descent wrt x will indeed recover the inverse x* = f^{-1}(y*) since that is what minimizes the loss. I assume this what the article is pointing at.

If you want to be picky, it's true that the direct analogue of continuous optimization would be discrete optimization (integer programming, TSP, etc) rather than decision problems like SAT. But there are straightforward reductions between the two so it's common to speak of optimization problems as being in P or NP even though that's not entirely accurate.

"So, for example, if you can solve some problem \Pi by running a SAT solver ten times, this doesn’t mean that you have reduced that problem to SAT— in reduction, you can only run the SAT solver once."

This is also not true.

Hi, the author of the blog post here.

The section was written horribly -- while I was talking about backpropagation, I was thinking about "what can be done in polynomial time" and there's a mismatch as you explained. Thanks and shame on me, I rewrote it.

In any case, I would recommend watching the video first, the article is just accompanied stream of consciousness for those few who really liked the video.

This is one of the many comments I see on HN where I genuinely have no idea whether it’s satire or just high-level technical talk. I assume the latter, but I don’t know enough to disprove the former
It's not satire. If you have some function with several inputs, the Jacobian is a matrix of all the partial derivatives of this function with respect to each input variable[1]. Since derivatives give the slope of a function, if you think about your function as being like a bumpy surface with the height at each point being the output, this matrix tells you which way (and how far) to change any input if you want to go "up" or "down" in the output.

Backpropogation is a way to optimise a neural network. You want to know how best to nudge the weights of the network to optimise some loss function, so what you do is compute the gradient (ie partial derivative) of that function with respect to each of the weights. This allows you to then tweak the weights of the function so your network gets better at whatever task you're trying to get it to learn. See [2] to understand how this works and and [3] to understand how this relates to the Jacobian, but generally if you're trying to go "downhill" in your loss function it's easy to see intuitively that knowing which way the function slopes (ie the effect of tweaking each of the weights) is important and that's what the Jacobian tells you.

The inverse of a matrix[4] and its transpose[5] are two different operations in linear algebra. Transpose turns rows into columns and columns into rows and the inverse of a matrix is a little harder to grasp maybe without background, but you could think of multiplying one matrix by the inverse of another as a little like division (since you can't actually divide matrices).[6]

[1] https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinan...

[2] https://www.youtube.com/watch?v=Ilg3gGewQ5U

[3] https://www.youtube.com/watch?v=tIeHLnjs5U8

[4] https://math.libretexts.org/Workbench/1250_Draft_4/07%3A_Mat...

[5] https://math.libretexts.org/Bookshelves/Linear_Algebra/Funda...

[6] algebraists please don't shoot me for that.