Hacker News new | ask | show | jobs
by jiggawatts 786 days ago
I recently tried a Fermi estimation problem on a bunch of LLMs and they all failed spectacularly. It was crossing too many orders of magnitude, all the zeroes muddled them up.

E.g.: the right way to work with numbers like a “trillion trillion” is to concentrate on the powers of ten, not to write the number out in full.

1 comments

Predicting the next character alone cannot achieve this kind of compression, because the probability distribution obtained from the training results is related to the corpus, and multi-scale compression and alignment cannot be fully learned by the backpropagation of this model