| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bilekas 436 days ago
	> This company has a track record of cheating on benchmark by training on test datasets. That's a really bold statement to make without any references, could you by any chance add some references for us not in the know ?

1 comments

yxchng 436 days ago

Mostly anecdotal and also based on the large gap between benchmark performance and actual performance on personal use

link

earthnail 436 days ago

I think saying "track record" is then factually wrong. Merriam Webster defines track record as: "a record of past performance often taken as an indicator of likely future performance". In that case, you should be able to point to the record.

I assume you will still stand behind the essence of your comment. In that case, it would be better to say "Based on my experience on playing with their models, I have strong reasons to believe that they continuously cheat on benchmarks by training on test datasets." You can then also add that this maps with what you hear from others in the field.

link

itchyjunk 435 days ago

Hmm, even if the model performed poorly on real world task vs benchmark, it doesn't necessarily imply they train on the benchmarks themselves, right? They did the train, test split properly. Didn't train on the test. But the benchmark itself was bad at representing real world tasks? Is so, seems pretty wild to accuse a company of training on test data.. maybe this is "vibe commenting" and I'm just out of loop.

link