Hacker News new | ask | show | jobs
by currymj 471 days ago
that was “bitter lesson” in action.

for example there are clever ways of rewarding all the steps of a reasoning process to train a network to “think”. but deepseek found these don’t work as well as much simpler yes/no feedback on examples of reasoning.