Hacker News new | ask | show | jobs
by Leynos 39 days ago
Look at the tasks in the benchmark (see ยง2 https://arxiv.org/html/2503.14499v3)
1 comments

Yeah, what about them? As far as I read it the tasks are fixed. The AI companies should know the tasks by now, and have overfitted their models on the tests by now, in the same way I'm implying I overfitted my model to reproduce Harry Potter.
You can choose to believe that.