| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by versteegen 76 days ago
	The dataset miscomparison is a big problem. The prompt is super specific to ARC-AGI-3, which is perfectly fine to do, but skimming it I saw nothing that appears specific to the 25 games in the dataset. Especially considering they've only had one day for overfitting. Could be quite subtle leakage though.