|
|
|
|
|
by adam_arthur
1189 days ago
|
|
Approximating continuous functions is likely quite the same as what people do too. You think there isn’t some mathematical model under the hood of how the brain works too? That it doesn’t break down into functions with interpretable results? Is it spiritual or mystical in your mind? These takes are so bad and pervasive on here, honestly. This is what I mean by grandiose thinking. A machine that approximates functions, that otherwise is indistinguishable from human, is effectively intelligent like a human. Incentives, wants, desires, and the ability to conduct our own training is the only difference at that point. |
|
No, they're just saying there are continuous functions, and then there are discrete functions, and neural nets can't approximate discrete functions, while humans certainly can (e.g. integer addition). And that even when it comes to approximating any continuous function, neural nets can do that in principle, but we don't know how to do it in practice, just like we know time travel, stable wormholes and the Alcubierre drive are feasible in principle, but we can't realise them in practice.
So please don't say it's "spiritual and mystical" in the other person's mind just because it's not very clear in yours.
Also, what the OP didn't say is that a Transformer architecture is not the kind of architecture used to show the universality of neural nets. That was shown for a multi-layer perceptron (MLP) with one hidden layer, not a deep neural net like a Tansformer, and certainly not a network with attention heads. If you wanted to be all theoretical about it and claim that because there's that old proof, someone will eventually find out how to do it in practice, then the Transformer architecture has already taken a wrong turn and is moving away from the target.
There aren't no universality results for Transformers. I mean, that would be the day! The reason that that proof was derived for a MLP with one hidden layer is that this makes the proof much, much easier, than if you wanted to show the same for another architecture.