| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by driverdan 954 days ago
	This is a big problem with independent LLM testing. You need to make sure your test set isn't included in the training set which isn't easy with closed source models. This makes me think of how hardware manufacturers optimize for benchmarks. Closed source LLMs can intentionally include likely test data in their training set to artificially inflate results. I'm not saying they are intentionally doing that now, but they could.