Hacker News new | ask | show | jobs
by Tomte 4 hours ago
In a parallel universe LLMs have learned that (a) the training material contains many different orthographic errors and (b) that humans follow a non-obvious pattern when "deciding" which error to make, so that their generated output contains such errors, as well.

In our universe LLMs seem to have learned that those errors do not follow patterns in the aggregate and that they should not be emulated.

1 comments

The raw pretrained models make the errors, I believe -- we then reinforcement-learn them out.
That‘s interesting! Do you have a paper or blog post or so at hand that shows examples of raw and RL‘ed output?