| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by viraptor 490 days ago
	This training is less about learning how to reason and more about conditioning the llm to use self-evaluations automatically. You could probably reproduce this effect yourself by sticking a paper reminder in front of you and writing "after every small step, spend 2 minutes considering if it's right and does it work in the context of the task so far; evaluate alternatives" on it. (which yes, could improve reasoning likely)