| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gpt5 28 days ago
	ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.

1 comments

What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.