Hacker News new | ask | show | jobs
by Kuinox 303 days ago
This is a simple sum of 2 whole number, the number are simply big.

Most of the time they make a correct summation table but fail to copy correctly the sum result into a final result. That is not a tokenisation problem (you can change the output format to make sure of it). I have a separated benchmark that test specifically this, when the input is too large, the LLMs fails to accuratly copy the correct token. I suppose the positional embedding, are not perfectly learned and it sometimes cause a mistake.

The prompt is quite short, it use structured output, and I can generate a nice graph of % of good response accross difficulity of the question (which is just the total digit count of the input numbers.

LLMs have 100% success rate on theses sum until they reach a frontier, past that their accuracy collapse at various speed depending of the model.

2 comments

This is close to what the apple paper [1] also found on constraint satisfaction problems. As an example, on towers of hanoi, past a frontier, accuracy collapses.

Even when the algorithm steps are laid out precisely, they cannot be followed. Perhaps, LLMs should be trained on turing machine specs and be given a tape lol.

Constraint satisfaction and combinatorics are where the search space is exponential, and the techniques are not formalized (not enough data in training set), and remain hard for machines as seen in the Problem 6 of IMO which could not be solved by LLMs. I suspect, there is this aspect of human intelligence which is not yet captured in LLMs.

[1] - https://machinelearning.apple.com/research/illusion-of-think...

Have you tried greedy decoding (temp 0) in aistudio?

The temp 0.7-1.0 defaults are not designed for reconstructing context with perfect accuracy.

I always use the lowest temperature that I can input. But GPT-5 doesn't support a temperature setting. You'll get something like:

{ "error": { "message": "Unsupported value: 'temperature' does not support 0.0 with this model. Only the default (1) value is supported.", "type": "invalid_request_error", "param": "temperature", "code": "unsupported_value" } }