Hacker News new | ask | show | jobs
by yxchng 436 days ago
This company has a track record of cheating on benchmark by training on test datasets. Take it with pinch of salt and try the model yourself.
2 comments

> This company has a track record of cheating on benchmark by training on test datasets.

That's a really bold statement to make without any references, could you by any chance add some references for us not in the know ?

Mostly anecdotal and also based on the large gap between benchmark performance and actual performance on personal use
I think saying "track record" is then factually wrong. Merriam Webster defines track record as: "a record of past performance often taken as an indicator of likely future performance". In that case, you should be able to point to the record.

I assume you will still stand behind the essence of your comment. In that case, it would be better to say "Based on my experience on playing with their models, I have strong reasons to believe that they continuously cheat on benchmarks by training on test datasets." You can then also add that this maps with what you hear from others in the field.

Hmm, even if the model performed poorly on real world task vs benchmark, it doesn't necessarily imply they train on the benchmarks themselves, right? They did the train, test split properly. Didn't train on the test. But the benchmark itself was bad at representing real world tasks? Is so, seems pretty wild to accuse a company of training on test data.. maybe this is "vibe commenting" and I'm just out of loop.
> This company has a track record of cheating on benchmark by training on test datasets.

That's like all of them, bro. Don't hate the player, hate the game.