| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Leynos 39 days ago
	Look at the tasks in the benchmark (see §2 https://arxiv.org/html/2503.14499v3)

1 comments

MadxX79 39 days ago

Yeah, what about them? As far as I read it the tasks are fixed. The AI companies should know the tasks by now, and have overfitted their models on the tests by now, in the same way I'm implying I overfitted my model to reproduce Harry Potter.

link

Leynos 38 days ago

You can choose to believe that.

link