|
|
|
|
|
by InkCanon
64 days ago
|
|
I strongly disagree with the claim that it's a phenomenal paper on exploits, the exploits themselves are nowhere near significant in the cybersecurity research sense. It's saying that implementations of these benchmarks has exploits on the way they conduct their tests. It doesn't discover that current LLMs are doing it (they highlighted several other exploits in the past), they only say it's a possible way they could cheat. It's a bit like they've discovered how to hack your codeforces score. What they claim as exploits is also deeply baffling. Like the one where they say if you exploit the system binaries to write a curl wrapper, you can download the answers. This is technically true, but it is an extremely trivial statement that if you have elevated system privileges, you can change the outputs of programs running on it. I'm actually deeply confused about why this is a paper. This feels like it should be an issue on GitHub. If I were being blunt, I'd say they are trying really hard to make a grand claim about how benchmarks are bad, when all they've done is essentially discovered several misconfigured interfaces and website exploits. |
|