| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nikisweeting 109 days ago
	We can definitely make harder evals, the problem is a good eval set is indistinguishable from good training data / market edge, so no one is incentivized to share their best eval sets publicly.