|
|
|
|
|
by dawnofdusk
568 days ago
|
|
Are there any results about the "optimality" of backpropagation? Can one show that it emerges naturally from some Bayesian optimality criterion or a dynamic programming principle? This is a significant advantage that the "free energy principle" people have. For example, let's say instead of gradient descent you want to do a Newton descent. Then maybe there's a better way to compute the needed weight updates besides backprop? |
|
The important thing is backprop does work and so we're just scaling it up to absurd levels to get good results. There is going to be a big step change found sooner or later where training gets a lot better. Maybe there is some sort of threshold we're looking for where a trick only works for models with lots of parameters or something before we stumble on it, but if evolution can do it so will researchers.