|
|
|
|
|
by ansk
638 days ago
|
|
> humanity discovered an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data) He's hand-waving around the idea presented in the Universal Approximation Theorem, but he's mangled it to the point of falsehood by conflating representation and learning. Just because we can parameterize an arbitrarily flexible class of distributions doesn't mean we have an algorithm to learn the optimal set of parameters. He digs an even deeper hole by claiming that this algorithm actually learns 'the underlying “rules” that produce any distribution of data', which is essentially a totally unfounded assertion that the functions learned by neural nets will generalize is some particular manner. > I find that no matter how much time I spend thinking about this, I can never really internalize how consequential it is. If you think the Universal Approximation Theorem is this profound, you haven't understood it. It's about as profound as the notion that you can approximate a polynomial by splicing together an infinite number of piecewise linear functions. |
|
This is equally mangled, if not more, than what Altman is saying. We don't need to learn "the optimal" set of parameters. We need to learn "a good" set of parameters that approximates the original distribution "well enough." Gradient methods and large networks with lots of parameters seem to be capable of doing that without overfitting to the data set. That's a much stronger statement than the universal approximation theorem.