|
|
|
|
|
by currymj
471 days ago
|
|
that was “bitter lesson” in action. for example there are clever ways of rewarding all the steps of a reasoning process to train a network to “think”. but deepseek found these don’t work as well as much simpler yes/no feedback on examples of reasoning. |
|