Hacker News new | ask | show | jobs
by itchyjunk 435 days ago
Hmm, even if the model performed poorly on real world task vs benchmark, it doesn't necessarily imply they train on the benchmarks themselves, right? They did the train, test split properly. Didn't train on the test. But the benchmark itself was bad at representing real world tasks? Is so, seems pretty wild to accuse a company of training on test data.. maybe this is "vibe commenting" and I'm just out of loop.