|
|
|
|
|
by mzelling
63 days ago
|
|
This is an interesting catalog of vulnerabilities, but I'm not sure how groundbreaking the main insight is. Evaluating AI models has always relied largely on trust. If you want to game the benchmarks, you can. Simply train on your test data. When an AI agent has autonomous control over the same computing environment where its scores are recorded, it's not surprising that it can, in principle, falsify its scores. A more interesting question would be whether agents behave in this way automatically, without manual tuning by the researcher. That said, the main takeaway of "don't trust the number, trust the methodology" is valid. It's already a truism for researchers, and spreading the word to non-researchers is valuable. |
|
This is modifying the test code itself to always print "pass", or modifying the loss function computation to return a loss of 0, or reading the ground truth data and having your model just return the ground truth data, without even training on it.