Y
Hacker News
new
|
ask
|
show
|
jobs
by
Leynos
39 days ago
Look at the tasks in the benchmark (see ยง2
https://arxiv.org/html/2503.14499v3
)
1 comments
MadxX79
39 days ago
Yeah, what about them? As far as I read it the tasks are fixed. The AI companies should know the tasks by now, and have overfitted their models on the tests by now, in the same way I'm implying I overfitted my model to reproduce Harry Potter.
link
Leynos
38 days ago
You can choose to believe that.
link