Hacker News new | ask | show | jobs
by viraptor 490 days ago
This training is less about learning how to reason and more about conditioning the llm to use self-evaluations automatically. You could probably reproduce this effect yourself by sticking a paper reminder in front of you and writing "after every small step, spend 2 minutes considering if it's right and does it work in the context of the task so far; evaluate alternatives" on it. (which yes, could improve reasoning likely)