Hacker News new | ask | show | jobs
by mckirk 1145 days ago
The problem is that these models do not have any working memory they could use to carry out such tasks, which are on a meta-level when seen from a language perspective. They can only go with their 'gut instinct' for selecting the next word, they can't 'consider and ponder the problem internally' first.
2 comments

The problem is that the input is tokenized before the model gets it as input. It does not see the individual letters "t" + "o". It gets one single token, #1462. The word "toe" is another single token, #44579. Maybe over time it could learn from context that inputs that start with #44579 also satisfy the constraint of starting with #1462, but that's a lot of work and it's not going to happen for all combinations of letters.
Perhaps prompting the model to first describe its approach to answering the question. This type of chain-of-thought technique can yield better results.