|
|
|
|
|
by euclaise
109 days ago
|
|
Maybe RL? Just like similar corrections in reasoning traces. You can train non-'thinking' models the same way (though if you're naive about it then you might end up with responses that are similarly rambly), and I'd expect it to have been |
|