|
|
|
|
|
by currymj
2194 days ago
|
|
Do you know of any other kinds of universal function approximators that also have a good regret bound like this, or is yours the first one? global convergence to any arbitrarily weird function at rate O(sqrt(T)) seems amazing, almost too good to be true, and I’m wondering what the catch is. Maybe it’s just a moderately nice property but not extraordinary? Maybe there are some horrible constants hiding in there? |
|
2) There are some restrictions on the class of density functions it can model, so arbitrarily weird is a bit strong, but the model class is very general.
3) The weights needed to model any function in this class although finite, can be arbitrarily large. The regret of a single neuron has a dependence on the diameter of the convex set your weights reside in, so there is a nasty constant of sorts in there, and this will also carry over when you analyse the regret of a complete network. With such a general statement, it's unavoidable sadly.
4) The universality result on its own is just a nice property. See it as the first stepping stone to a more meaningful analysis. What you really want is for the model class to grow as you add more neurons, using weights within a realistic range, and that the method performs well in practice on some problems people care about -- we provide empirical evidence that the capacity grows on par with deep relu networks with our capacity experiments, and show a bunch of results where the method works, but we don't have a theoretical characterisation of the class of density functions it can model well (i.e. if the function has some nice structural property, then a network of reasonable size is guaranteed to learn a good approximation quickly). Such a result would be extraordinary in my eyes. Because the network is composed of simple and well understood building blocks, I am optimistic that such an analysis will be possible in the future.