|
|
|
|
|
by simonw
979 days ago
|
|
"three plus five apples is a total of..." is a really interesting example, because it doesn't actually require arithmetic at all. A language model trained on enough text will be able to complete this just based on having encountered the pattern "three plus five SOMETHING is a total of..." enough times in its training data. This becomes even more apparent when you work with smaller models - the 7B etc models which can run on a laptop. They can often solve small arithmetic problems like this while having no chance at all of working with larger numbers that they haven't ever encountered in their training data. I really like using those smaller models as tools to better understand how this technology works. |
|