| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Bjorkbat 547 days ago
	If I recall correctly the authors of the benchmark did mention on Twitter that for certain issues models will submit an answer that technically passes the test but is kind of questionable, so yeah, good point.