Hacker News new | ask | show | jobs
by whimsicalism 1543 days ago
It would be trivial for a network of this size to code general rules for multiplication.

At a certain point, when you have enough data, finding the actual rule is actually the easier solution than memorizing each data point. This is the key insight of deep learning.

1 comments

Really? Better inform all the researchers working on this that they're wasting their time then: https://arxiv.org/abs/2001.05016

More fundamentally, any finite neural net is either constant or linear outside the training sample,depending on the activation function. Unless you design special neurons like in the paper above, which solves this specific problem for arithmetic, but not the general problem of extrapolation.

> any finite neural net is either constant or linear outside the training sample

Hence why the structure of our bodies has to include the capacity for imagination. Our brain structure does not record everything that has happened. It permits is to imagine an infinite number of things which might happen.

We do not come to understand the world by having a brain-structure isomorphic to world structure -- this is none-sense for, at least, the above reason. But also, there really isnt anything like "world structure" to be isomorphic to. Ie., brains arent HDDs.

They are, at least, simulators. I dont think we'll find anything in the brain like "leaves are green" because that is just a generated public representation of a latent-simulating-thought. There isnt much to be learned about the world from these, they only make sense to us.

That all the text of human history has associations between words is the statistical coincidence that modern NLP uses for its smoke-and-mirrors. As a theory of language it's madness.

Isn't that per-layer?
No, no matter how many piecewise linear functions you compose, the result is still piecewise linear.
Well sure, but neurons are still universal approximators. Any CPU is a sum of piecewise linear functions. I don't see where this meaningfully limits the capabilities of an AI, since once we're multilayer there's no 1:1 relation between training samples and piece placement in the output.
I just don't see how that's relevant. Nobody uses one-hidden-layer networks anymore. Whatever GPT is doing, it has nothing to do with approximating a collection of samples by assembling piecewise functions, except in the way that Microsoft Word is based on the Transistor.