|
>> Is it spiritual or mystical in your mind? No, they're just saying there are continuous functions, and then there are discrete functions, and neural nets can't approximate discrete functions, while humans certainly can (e.g. integer addition). And that even when it comes to approximating any continuous function, neural nets can do that in principle, but we don't know how to do it in practice, just like we know time travel, stable wormholes and the Alcubierre drive are feasible in principle, but we can't realise them in practice. So please don't say it's "spiritual and mystical" in the other person's mind just because it's not very clear in yours. Also, what the OP didn't say is that a Transformer architecture is not the kind of architecture used to show the universality of neural nets. That was shown for a multi-layer perceptron (MLP) with one hidden layer, not a deep neural net like a Tansformer, and certainly not a network with attention heads. If you wanted to be all theoretical about it and claim that because there's that old proof, someone will eventually find out how to do it in practice, then the Transformer architecture has already taken a wrong turn and is moving away from the target. There aren't no universality results for Transformers. I mean, that would be the day! The reason that that proof was derived for a MLP with one hidden layer is that this makes the proof much, much easier, than if you wanted to show the same for another architecture. |
It gets some math wrong because it doesn't understand the "systemic" aspect of math, but who's to say that with minor training tweaks, or a larger dataset, it wouldn't be able to infer the system? Humans infer systems from language all the time. To say you need some specialized form of training beyond language inference is obviously wrong when you view how humans train, learn and understand. All of life is ingestion of information via language which produces systemic understanding.
I can play digital audio that's indistinguishable from acoustic, despite it not being a smooth function in practice. Similarly, a sufficiently advanced neural net can produce intellect-like results, even if there are aspects of the structure you say may not make it so.
Honestly, the perception you and many others seem to hold is that because something is mathematically explainable in such a way that you can "trivialize" its operation, makes it not intelligence. But you hold "intelligence" in too high a regard