|
|
|
|
|
by Linello
89 days ago
|
|
Scaffolding is all you need. I am absolutely certain about that.
It's abound finding good ways to approximate the reward function being used during post-training, but at inference time. A general enough reward that can score candidates well will inevitably improve the abilities of LLMs when put inside scaffolds. |
|