Hacker News new | ask | show | jobs
by hackinthebochs 1543 days ago
>In short, there's no actual model of what an apple is, as an object in space, connected to other objects by various relationships.

I don't know why you think language models are fundamentally unable to deduce the knowledge of the points you mention. Much knowledge isn't explicitly stated, but is implicit and can be deduced from a collection of explicit facts. For example, apples are food, food is physical matter, physical matter is fixed in size, cannot be in two places at once, maintains its current momentum unless acted on by a force, etc. Categorization and deducing properties from an object's category is in parameter space of language models. There's no reason to think that a sufficiently large model will not land on these parameters.

>But the reason GPT-3 and other models tend to make no sense is because their output is not constrained by reality.

The issue isn't what GPT-3 can or cannot do, its about what autoregressive language models as a class are capable of. Yes, there are massive holes in GPT-3's ability to maintain coherency across wide ranges of contexts. But GPT-3's limits does not imply a limit to autoregressive language models more generally.

1 comments

>I don't know why you think language models are fundamentally unable to deduce the knowledge of the points you mention.

Because the knowledge is not there in the text, the models are not able to represent it, and as seen in the demonstration above, they don't have it.

The demonstration is irrelevant. The issue isn't what GPT-3 can or cannot do, but what this class of models can do.

Reduce knowledge to particular kinds of information. Gradient descent discovers information by finding parameters that correspond to the test criteria. Given a large enough data set that is sufficiently descriptive of the world, the "shape" of the world described by the data admits better and worse structures to predict the data. The organizing and association of information that we call knowledge is a part of the parameter space of LLMs. There is no reason to think such a learning process cannot find this parameter space.