| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by popinman322 779 days ago
	Also, similar to Orca-Math but without a teacher model. They also followed an iterative DPO/KTO scheme, but with no length normalized NLL loss term.

1 comments

If we had a magical (fast) oracle for grading responses, have people done search/expert iteration for LLMs?

Specifically for codegen, i am playing with an iterative interpreter that can quickly (re)evaluate a tree of similar responses