|
|
|
|
|
by friendly_aixi
2199 days ago
|
|
The result here is stronger, in the sense that typical NN universality results are statements with respect to just capacity (and not how you optimise them). Here, the result holds with respect to both capacity + a choice of suitable no regret online convex optimisation algorithm (e.g. online gradient descent). Of course, this is just one desirable property of a general purpose learning algorithm. |
|
global convergence to any arbitrarily weird function at rate O(sqrt(T)) seems amazing, almost too good to be true, and I’m wondering what the catch is. Maybe it’s just a moderately nice property but not extraordinary? Maybe there are some horrible constants hiding in there?