| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yldedly 1543 days ago

The structure of having X apples in Y buckets is the same as the structure in the expression "X * Y", as long as the expression exists in a context that can parse it using the rules of arithmetic, such as a human, or a calculator.

These language models lack context, not just for arithmetic, but for everything. They can't parse "X * Y" for any X and Y, they've just associated the expression with the right answer for so many values of X and Y, that we get fooled into thinking they know the rules.

We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

2 comments

whimsicalism 1543 days ago

It would be trivial for a network of this size to code general rules for multiplication.

At a certain point, when you have enough data, finding the actual rule is actually the easier solution than memorizing each data point. This is the key insight of deep learning.

yldedly 1543 days ago

Really? Better inform all the researchers working on this that they're wasting their time then: https://arxiv.org/abs/2001.05016

More fundamentally, any finite neural net is either constant or linear outside the training sample,depending on the activation function. Unless you design special neurons like in the paper above, which solves this specific problem for arithmetic, but not the general problem of extrapolation.

mjburgess 1543 days ago

> any finite neural net is either constant or linear outside the training sample

Hence why the structure of our bodies has to include the capacity for imagination. Our brain structure does not record everything that has happened. It permits is to imagine an infinite number of things which might happen.

We do not come to understand the world by having a brain-structure isomorphic to world structure -- this is none-sense for, at least, the above reason. But also, there really isnt anything like "world structure" to be isomorphic to. Ie., brains arent HDDs.

They are, at least, simulators. I dont think we'll find anything in the brain like "leaves are green" because that is just a generated public representation of a latent-simulating-thought. There isnt much to be learned about the world from these, they only make sense to us.

That all the text of human history has associations between words is the statistical coincidence that modern NLP uses for its smoke-and-mirrors. As a theory of language it's madness.

FeepingCreature 1543 days ago

Isn't that per-layer?

yldedly 1543 days ago

No, no matter how many piecewise linear functions you compose, the result is still piecewise linear.

FeepingCreature 1543 days ago

Well sure, but neurons are still universal approximators. Any CPU is a sum of piecewise linear functions. I don't see where this meaningfully limits the capabilities of an AI, since once we're multilayer there's no 1:1 relation between training samples and piece placement in the output.

yldedly 1543 days ago

https://medium.com/analytics-vidhya/you-dont-understand-neur...

hackinthebochs 1543 days ago

>We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text.

To what degree does the structure of text correspond to structure of the world, in the limit of a maximally descriptive text corpus? Nearly complete if not totally complete, as far as I can tell. What is left out? The subjective experience of being embodied in the world. But this subjective experience is orthogonal to the structure of the world. And so this limitation does not prevent an understanding of the structure.

yldedly 1543 days ago

The point is that not only is it impossible to infer the structure of the world from text, deep learning is incapable of learning about or even representing the world.

The reason language makes sense to us is that it triggers the right representations. It does not make sense intrinsically, it's just a sequence of symbols.

Learning about the world requires at least causal inference, modular and compact representations such as programming languages, and much smarter learning algorithms than random search or gradient descent.

hackinthebochs 1543 days ago

I don't know why you think this. There is much structural regularity in a large text corpus that is descriptive of relationships in the world. Eventually the best way to predict this regularity is just to land in a portion of parameter space that encodes the structure. But again, in the limit of a maximally descriptive text corpus, the best way to model this structure is just to encode the structure of the world. You have given no reason to think this is inherently impossible.

yldedly 1543 days ago

>There is much structural regularity in a large text corpus that is descriptive of relationships in the world.

Sure, there is a lot. But let's say we want to learn what apples are. So we look at occurrences of "apple" in the text corpus, and learn that apples can be eaten, they can be sweet, sometimes they are sour, red, sometimes green, and so on.

Can apples spontaneously change size? Hmm, no idea, no mention of that in the text. Can they be used as a knife? Dunno. If I had an apple in New York 4 minutes ago, can someone else be eating the same apple in Hong Kong now? Dunno. Did apples exist in France two million years ago? Dunno. Can you drive to Saturn in an apple? Dunno.

In short, there's no actual model of what an apple is, as an object in space, connected to other objects by various relationships. If there were, the model could figure out the answers to the questions above by inheritance.

Maybe these particular questions happen to be answered correctly by PaLM. Maybe not, but the next LLM will include this comment in the training corpus.

But the reason GPT-3 and other models tend to make no sense is because their output is not constrained by reality. The text in the training corpus tends to conform to reality, but when you prompt the model with questions that nobody would ever write text about, the illusion falls away:

Q: How many eyes does a giraffe have? A: A giraffe has two eyes.

Q: How many eyes does my foot have? A: Your foot has two eyes.

Q: How many eyes does a spider have? A: A spider has eight eyes.

Q: How many eyes does the sun have? A: The sun has one eye.

Q: How many eyes does a blade of grass have? A: A blade of grass has one eye.

Q: How do you sporgle a morgle? A: You sporgle a morgle by using a sporgle.

Q: How many bonks are in a quoit? A: There are three bonks in a quoit.

Q: How many rainbows does it take to jump from Hawaii to seventeen? A: It takes two rainbows to jump from Hawaii to seventeen.

Q: Which colorless green ideas sleep furiously? A: Ideas that are colorless, green, and sleep furiously are the ideas of a sleep furiously.

Q: Do you understand these questions? A: I understand these questions.

(from https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.h...)

hackinthebochs 1543 days ago

>In short, there's no actual model of what an apple is, as an object in space, connected to other objects by various relationships.

I don't know why you think language models are fundamentally unable to deduce the knowledge of the points you mention. Much knowledge isn't explicitly stated, but is implicit and can be deduced from a collection of explicit facts. For example, apples are food, food is physical matter, physical matter is fixed in size, cannot be in two places at once, maintains its current momentum unless acted on by a force, etc. Categorization and deducing properties from an object's category is in parameter space of language models. There's no reason to think that a sufficiently large model will not land on these parameters.

>But the reason GPT-3 and other models tend to make no sense is because their output is not constrained by reality.

The issue isn't what GPT-3 can or cannot do, its about what autoregressive language models as a class are capable of. Yes, there are massive holes in GPT-3's ability to maintain coherency across wide ranges of contexts. But GPT-3's limits does not imply a limit to autoregressive language models more generally.

yldedly 1542 days ago

>I don't know why you think language models are fundamentally unable to deduce the knowledge of the points you mention.

Because the knowledge is not there in the text, the models are not able to represent it, and as seen in the demonstration above, they don't have it.

FeepingCreature 1543 days ago

It sounds like you're arguing that GPT doesn't work because it cannot work. However, it does work.

So how does PaLM understand causal chains and explain jokes that it has never seen before?

yldedly 1543 days ago

It doesn't. It's pattern matching, and you're seeing cherry picked examples. The pattern matching is enough to give the illusion of understanding. There's plenty of articles where more thorough testing reveals the difference. Here are two: https://medium.com/@melaniemitchell.me/can-gpt-3-make-analog...

But you could also just try one of these models, and see for yourself. It's not exactly subtle.

https://www.technologyreview.com/2020/08/22/1007539/gpt3-ope...

FeepingCreature 1543 days ago

GPT-3 was specifically worse at jokes, which is why PaLM being good at this so impresses me. At any rate, I don't care if it only works one in ten times. To me, this is equivalent to complaining that the dog has bad marks in high school. (PaLM could probably explain that one to you: "The speaker is complaining that the dog is only getting C's. For a human a C is a quite bad mark. However getting even a C is normally impossible for a dog.")

"It's pattern matching" just sounds like an excuse for why it working "doesn't really count". At this point, you are asking me to disbelieve plain evidence. I have played with these models, people I know have played with these models, I have some impression of what they're capable of. I'm not disagreeing it's "just pattern matching", whatever that means, I am asserting that "pattern matching" is Turing-complete, or rather, cognition-complete, so this is just not a relevant argument to me.

What do you think a neuron does?

yldedly 1543 days ago

>At any rate, I don't care if it only works one in ten times

>you are asking me to disbelieve plain evidence