|
|
|
|
|
by yldedly
1543 days ago
|
|
The structure of having X apples in Y buckets is the same as the structure in the expression "X * Y", as long as the expression exists in a context that can parse it using the rules of arithmetic, such as a human, or a calculator. These language models lack context, not just for arithmetic, but for everything. They can't parse "X * Y" for any X and Y, they've just associated the expression with the right answer for so many values of X and Y, that we get fooled into thinking they know the rules. We get fooled into thinking they've learned the structure of the world. But they've only learned the structure of text. |
|
At a certain point, when you have enough data, finding the actual rule is actually the easier solution than memorizing each data point. This is the key insight of deep learning.