|
|
|
|
|
by torginus
757 days ago
|
|
I just wonder if numbers were written right to left, llms would be much better at arithmetic. You can 'predict' the least significant digit by reusing the already written digits in the computation, but to generate most significant ones, you generally need to do the entire computation in one go. |
|
> We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges.