Hacker News new | ask | show | jobs
by euclaise 109 days ago
Maybe RL? Just like similar corrections in reasoning traces. You can train non-'thinking' models the same way (though if you're naive about it then you might end up with responses that are similarly rambly), and I'd expect it to have been