|
|
|
|
|
by ComplexSystems
638 days ago
|
|
"Just because we can parameterize an arbitrarily flexible class of distributions doesn't mean we have an algorithm to learn the optimal set of parameters." This is equally mangled, if not more, than what Altman is saying. We don't need to learn "the optimal" set of parameters. We need to learn "a good" set of parameters that approximates the original distribution "well enough." Gradient methods and large networks with lots of parameters seem to be capable of doing that without overfitting to the data set. That's a much stronger statement than the universal approximation theorem. |
|