| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by currymj 471 days ago
	that was “bitter lesson” in action. for example there are clever ways of rewarding all the steps of a reasoning process to train a network to “think”. but deepseek found these don’t work as well as much simpler yes/no feedback on examples of reasoning.