| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tadamcz 44 days ago
	I agree it's a potentially big problem, affecting almost any benchmark out there. We discuss it briefly in "Appendix A: Contamination and memorization" https://epoch.ai/blog/mirrorcode-preliminary-results#appendi.... Ideally one would do these benchmarks with held-out proprietary software, but that comes with many practical concerns.