| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Linello 89 days ago
	Scaffolding is all you need. I am absolutely certain about that. It's abound finding good ways to approximate the reward function being used during post-training, but at inference time. A general enough reward that can score candidates well will inevitably improve the abilities of LLMs when put inside scaffolds.