|
|
|
|
|
by mike_hearn
757 days ago
|
|
They explain their true goal in the introduction: > With positions resolved, we can study the logical extrapolation ability of transformers They are interested in how well they can make a neural net logically extrapolate outside its training set, once encoding barriers are removed. They show that in fact even quite small language models can do this successfully once we're not confusing them with bad encodings anymore. This seems like fundamental work. It was only a few years ago that Google employees were arguing LLMs were nothing more than "stochastic parrots". Well, that take will go down in history as one of the worst takes on AI ever. I don't think anyone really had any doubt by 2024 that this wasn't true, but the huge and opaque datasets meant people could always argue that maybe this wasn't an example of logical reasoning or extrapolation, maybe it had just seen this specific question before. But this work shows in a controlled environment that the model can learn the principles of addition and extrapolate to much larger numbers. It's not just repeating answers it's seen in its dataset. It should kill off the parrot meme for good. |
|
No, because it's given hand-engineered embeddings that act as a strong inductive bias that is specific to addition. It's like addition is programmed right in.