| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by InkCanon 64 days ago

I strongly disagree with the claim that it's a phenomenal paper on exploits, the exploits themselves are nowhere near significant in the cybersecurity research sense. It's saying that implementations of these benchmarks has exploits on the way they conduct their tests. It doesn't discover that current LLMs are doing it (they highlighted several other exploits in the past), they only say it's a possible way they could cheat. It's a bit like they've discovered how to hack your codeforces score.

What they claim as exploits is also deeply baffling. Like the one where they say if you exploit the system binaries to write a curl wrapper, you can download the answers. This is technically true, but it is an extremely trivial statement that if you have elevated system privileges, you can change the outputs of programs running on it.

I'm actually deeply confused about why this is a paper. This feels like it should be an issue on GitHub. If I were being blunt, I'd say they are trying really hard to make a grand claim about how benchmarks are bad, when all they've done is essentially discovered several misconfigured interfaces and website exploits.

1 comments

zero_k 64 days ago

Yes, agree. At the same time, it's what these top-tier universities are known for: presenting something relatively simple as if it was ground-breaking, but in a way that the average person can (or has a better chance to) understand it. I am still unsure whether the communication quality has such added value. But people seem to like it, so here we are.

link

pxc 64 days ago

There's a difference between a reliable hunch and really knowing something. What is obvious is not always (or even usually) easy to prove. And the process of proving the obvious sometimes turns up useful little surprises.

link

InkCanon 64 days ago

I do think there's value in science communication, but it does take an intelligent understanding of it on a case by case basis as to whether it's genuine or hype marketing.

Side note: talking to someone from such a "elite" university, I discovered many labs in these unis have standing orders by PIs to tweet their papers/preprints when published. Varies by field, in AI it is by far the most common.

link