Hacker News new | ask | show | jobs
by yldedly 1543 days ago
>There is much structural regularity in a large text corpus that is descriptive of relationships in the world.

Sure, there is a lot. But let's say we want to learn what apples are. So we look at occurrences of "apple" in the text corpus, and learn that apples can be eaten, they can be sweet, sometimes they are sour, red, sometimes green, and so on.

Can apples spontaneously change size? Hmm, no idea, no mention of that in the text. Can they be used as a knife? Dunno. If I had an apple in New York 4 minutes ago, can someone else be eating the same apple in Hong Kong now? Dunno. Did apples exist in France two million years ago? Dunno. Can you drive to Saturn in an apple? Dunno.

In short, there's no actual model of what an apple is, as an object in space, connected to other objects by various relationships. If there were, the model could figure out the answers to the questions above by inheritance.

Maybe these particular questions happen to be answered correctly by PaLM. Maybe not, but the next LLM will include this comment in the training corpus.

But the reason GPT-3 and other models tend to make no sense is because their output is not constrained by reality. The text in the training corpus tends to conform to reality, but when you prompt the model with questions that nobody would ever write text about, the illusion falls away:

Q: How many eyes does a giraffe have? A: A giraffe has two eyes.

Q: How many eyes does my foot have? A: Your foot has two eyes.

Q: How many eyes does a spider have? A: A spider has eight eyes.

Q: How many eyes does the sun have? A: The sun has one eye.

Q: How many eyes does a blade of grass have? A: A blade of grass has one eye.

Q: How do you sporgle a morgle? A: You sporgle a morgle by using a sporgle.

Q: How many bonks are in a quoit? A: There are three bonks in a quoit.

Q: How many rainbows does it take to jump from Hawaii to seventeen? A: It takes two rainbows to jump from Hawaii to seventeen.

Q: Which colorless green ideas sleep furiously? A: Ideas that are colorless, green, and sleep furiously are the ideas of a sleep furiously.

Q: Do you understand these questions? A: I understand these questions.

(from https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.h...)

1 comments

>In short, there's no actual model of what an apple is, as an object in space, connected to other objects by various relationships.

I don't know why you think language models are fundamentally unable to deduce the knowledge of the points you mention. Much knowledge isn't explicitly stated, but is implicit and can be deduced from a collection of explicit facts. For example, apples are food, food is physical matter, physical matter is fixed in size, cannot be in two places at once, maintains its current momentum unless acted on by a force, etc. Categorization and deducing properties from an object's category is in parameter space of language models. There's no reason to think that a sufficiently large model will not land on these parameters.

>But the reason GPT-3 and other models tend to make no sense is because their output is not constrained by reality.

The issue isn't what GPT-3 can or cannot do, its about what autoregressive language models as a class are capable of. Yes, there are massive holes in GPT-3's ability to maintain coherency across wide ranges of contexts. But GPT-3's limits does not imply a limit to autoregressive language models more generally.

>I don't know why you think language models are fundamentally unable to deduce the knowledge of the points you mention.

Because the knowledge is not there in the text, the models are not able to represent it, and as seen in the demonstration above, they don't have it.

The demonstration is irrelevant. The issue isn't what GPT-3 can or cannot do, but what this class of models can do.

Reduce knowledge to particular kinds of information. Gradient descent discovers information by finding parameters that correspond to the test criteria. Given a large enough data set that is sufficiently descriptive of the world, the "shape" of the world described by the data admits better and worse structures to predict the data. The organizing and association of information that we call knowledge is a part of the parameter space of LLMs. There is no reason to think such a learning process cannot find this parameter space.