| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jprete 841 days ago
	Thats what a benchmark is, and they're all gamed by everyone training models, even if they don't intend to, because the benchmarks are in the training data. The advantage of a personal set of questions is that you might be able to keep it out of the training set, if you don't publish it anywhere, and if you make sure cloud-accessed model providers aren't logging the conversations.