|
|
|
|
|
by StevenWaterman
412 days ago
|
|
No, even if the benchmarks are private, it's still an issue. Because you can overfit to the benchmark by trying X random variations of the model, and picking the one that performs best on the benchmark It's similar to how I can pass any multiple-choice exam if you let me keep attempting it and tell me my overall score at the end of each attempt - even if you don't tell me which answers were right/wrong |
|