| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by this_steve_j 250 days ago
	What are some ways to avoid common methological pitfalls when generating test cases for "groundedness" benchmarks with automation? Confirmation bias is one obvious pitfall that comes to mind, but also I wonder how it is possible to achieve reproducibility when the input is stochastic.