Hacker News new | ask | show | jobs
by FartyMcFarter 537 days ago
How would gaming the system work here? Is there some flaw in the way the tasks are generated?
1 comments

AI models have historically found lots of ways to game systems. My favorite example is exploiting bugs in simulator physics to "cheat" at games of computer tag. Another is a model for radiology tasks finding biases in diagnostic results using dates on the images. And of course whenever people discuss a benchmark publicly it leaks the benchmark into the training set, so the benchmark becomes a worse measure.