Hacker News new | ask | show | jobs
by kingstnap 254 days ago
It can't internally rewise. The last generation produces a distribution and sometimes the wrong answer gets sampled.

There is no "backspace" token, although it would be cool and fancy if we had that.

The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.

1 comments

There's been a few attempts at training a backspace token, though.

e.g.:

https://arxiv.org/abs/2502.04404

https://arxiv.org/abs/2306.05426