|
|
|
|
|
by lewdwig
391 days ago
|
|
Well-designed benchmarks have a public sample set and a private testing set. Models are free to train on the public set, but they can't game the benchmark or overfit the samples that way because they're only assessed on performance against examples they haven't seen. Not all benchmarks are well-designed. |
|
so effectively you can only guarantee a single use stays private