| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by euclaise 109 days ago
	Maybe RL? Just like similar corrections in reasoning traces. You can train non-'thinking' models the same way (though if you're naive about it then you might end up with responses that are similarly rambly), and I'd expect it to have been