| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Uninen 368 days ago
	This is wild! "when assessed by Claude 3.5 Sonnet’s production-grade RM, our unsupervised assistant policy wins 60% of head-to-head comparisons against the policy trained with the human-supervised RM." So now the models can even post-train the new models better than a human can

1 comments

cma 368 days ago

Everytop model in ARC AGI used a test time finery king approach. They they had one example pair though and would usually do transformations (color, mirroring, etc) of it for the finetuning, and that might have been coded by hand

link