Hacker News new | ask | show | jobs
by popinman322 779 days ago
Also, similar to Orca-Math but without a teacher model. They also followed an iterative DPO/KTO scheme, but with no length normalized NLL loss term.
1 comments

If we had a magical (fast) oracle for grading responses, have people done search/expert iteration for LLMs?

Specifically for codegen, i am playing with an iterative interpreter that can quickly (re)evaluate a tree of similar responses