| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crackrook 539 days ago
	No individual Kaggle solution achieved a result of 81%, rather an ensemble of models: https://x.com/fchollet/status/1865865271728390515 In my (possibly flawed) interpretation: o3's scores appear to be an achievement because they were attained by a single model, but the benchmark itself needs refinement before it can claim to be a measure of AGI like it set out to be, as one can bruteforce their way to similar results.