|
|
|
|
|
by 0xab
546 days ago
|
|
Datasets need to stop shipping with any training sets at all! And they should forbid anyone from using the test set to update the parameters of any model through their license. We did this with ObjectNet (https://objectnet.dev/) years ago. It's only a test set, no training set provided at all. Back then it was very controversial and we were given a hard time for it initially. Now it's more accepted. Time to make this idea mainstream. No more training sets. Everything should be out of domain. |
|
This gives closed source models an enormous advantage over open-source models.
The FrontierMath dataset has this same problem[1].
It's a shame because creating these benchmarks is time consuming and expensive.
I don't know of a way to fix this except perhaps partially by using reward models to evaluate results on random questions instead of using datasets, but there would be a lot of reproducibility problems with that.
Still -- not sure how to overcome this.
[1]: https://news.ycombinator.com/item?id=42494217