| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stared 60 days ago
	SWE-bench Verified is, at this point, contaminated https://openai.com/index/why-we-no-longer-evaluate-swe-bench... So it os hard to tell how much of a model gain is due to skill, and how much - overfitting.