|
|
|
|
|
by jmalicki
63 days ago
|
|
I think the point of the paper is to prod benchmark authors to at least try to make them a little more secure and hard to hack... Especially as AI is getting smart enough to unintentionally hack the evaluation environments itself, when that is not the authors intent. |
|