| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by padolsey 80 days ago
	Knowing the nature of a test ahead of time, building out your capabilities and tooling before entering the exam hall when your peers don't have that advantage, makes you a cheater.

2 comments

BoorishBears 80 days ago

Lots of people doing the same with extra steps (generating synthetic data from test questions with the LLM then training on it)

I wish we'd move past public test sets for LLM benchmarks: publish a plain english explanation of the tasks, allow questions and clarifications, and but never release a single question from the test set verbatim.

It made sense back when models needed to be finetuned on the task to even reliably answer. If we're saying this is the path to AGI we should be able to rely on the generalization of the model to get it right.

link

ting0 80 days ago

You have a problem with generating synthetic data from test questions? Humans simulate experiences in their mind. What's the problem?

link

BoorishBears 80 days ago

Models don't generalize as well as humans.

Synthetic data is fine. Synthetic data on very similar questions generated based on the description is typically fine. But once the shape of what you're training on gets too close to the actual holdout questions, you're getting an uplift that's not realistic for unseen tasks.

link

GorbachevyChase 80 days ago

Humans who have played games should also not be allowed to test in ARC AGI. Cavemen only.

link