|
|
|
|
|
by lossolo
240 days ago
|
|
With temp = 0 if the model is off by one bit at step k, all subsequent steps are deterministically wrong. Your previous example shows the best case, which is a model can sometimes follow a textual recipe for long multiplication on short inputs. That's not the same as learning a length generalizing bit exact algorithm. Basically what you shown is the model can describe the algorithm. It doesn't show it can execute it at scale. Without writable state and bit exact ops, errors grow with length and "focus more" only slows that failure, it doesn’t eliminate it. |
|
Well, modern LLM coding agent products (eg. Claude Code) are able to store state in files in the current repository. So, you could have the model keep the "CPU State", and the files in the repository be the "RAM".
Also, could this https://arxiv.org/html/2402.17764v1 possibly reduce errors when doing inference? There is no floating point operations