|
|
|
|
|
by this_steve_j
250 days ago
|
|
What are some ways to avoid common methological pitfalls when generating test cases for "groundedness" benchmarks with automation? Confirmation bias is one obvious pitfall that comes to mind, but also I wonder how it is possible to achieve reproducibility when the input is stochastic. |
|