|
|
|
|
|
by craftit
959 days ago
|
|
Because often, the tokens are broken up as random groups of numbers. For example, let's say 1984 appears quite a few times in the source text, this will become a single token. Given that these many different, semi-random groups of digits it is hard for the LLM to learn any consistent rules. I believe there are papers showing that if you structure numbers more consistently LLMs have no problem with this kind of arithmetic. |
|
With those structured numbers will the LLMs be 100% accurate on new prompts or will they just be better than chance (even significantly better than chance)?
Because this is one thing, it has to learn the structure and then create probabilities based on the data, but does that mean it's actually learning the underlying algorithm for addition for example or is it just getting better probabilities because of a narrowing of them? If it can indeed learn underlying algorithms like this that's super interesting. The reason also this is in an issue if it _can't_ learn those, you can never trust the answer unless you check it, but that's sort of a sidepoint.