Hacker News new | ask | show | jobs
by magicalhippo 109 days ago
I got the impression it's due to the reinforcement learning step, where it's taught to follow instructions, which rewards the model for such writing.