| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by snemvalts 15 days ago
	Most benchmarks can be trained for as well, so they are over-representative of model's engineering skills. The entire nature of a benchmark is collapsing some qualitative work (software engineering task, architecture choice, code quality) into a quantitative score which can be optimized for.