Hacker News new | ask | show | jobs
by sdenton4 38 days ago
Taking a quick look at the paper...

Their claim isn't that the brain uses gradient descent, but that the direction of updates has (on average) positive inner product with the gradient. I expect this would also be true for (say) simulated annealing, yet we don't say that simulated annealing is gradient descent.

There's also a discussion of loss functions and how they relate to the update missing - as far as I know, there's still no great notion of how the brain picks a global loss function, and no mechanism for backprop. In this paper, looking at a specific learning task you can define a loss function extrinsically allowing us to talk about the gradient, but how that relates to things happening in the brain is a big big mystery.

1 comments

Why would this be true for simulated annealing?
Because it improves the loss!

The gradient is the direction in which loss improves the fastest. Moving in a direction with a positive dot product with the gradient just means that you're (locally) improving the loss.