| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Corence 83 days ago
	Note the scoring function is significantly different for ARC-AGI-3. It isn't the percentage of tests passed like previous versions, it's the square of the efficiency ratio -- how many steps the model needed vs the second best human. So if a model can solve every question but takes 10x as many steps as the second best human it will get a score of 1%.