Y
Hacker News
new
|
ask
|
show
|
jobs
by
gpt5
28 days ago
ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.
1 comments
snemvalts
28 days ago
What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.
link