Y
Hacker News
new
|
ask
|
show
|
jobs
by
magicalhippo
109 days ago
I got the impression it's due to the reinforcement learning step, where it's taught to follow instructions, which rewards the model for such writing.