| There is more than one comment here asserting that the authors should have done a parallel comparison study against humans on the same question bank as if the study authors had set out to investigate whether humans or LLMs reason better in this situation. The authors do include the claim that humans would immediately disregard this information and maybe some would and some wouldn't that could be debated and seemingly is being debated in this thread - but I think the thrust of the conclusion is the following: "This work underscores the need for more robust defense mechanisms against adversarial perturbations, particularly, for models deployed in critical applications such as finance, law, and healthcare." We need to move past the humans vs ai discourse it's getting tired. This is a paper about a pitfall LLMs currently have and should be addressed with further research if they are going to be mass deployed in society. |
You want a moratorium on comparing AI to other form of intelligence because you think it's tired? If I'm understanding you correctly, that's one of the worst takes on AI I think I've ever seen. The whole point of AI is to create an intelligence modeled on humans and to compare it to humans.
Most people who talk about AI have no idea what the psychological baseline is for humans. As a result their understand is poorly informed.
In this particular case, they evaluated models that do not have SOTA context window sizes. I.e. they have small working memory. The AIs are behaving exactly like human test takers with working memory, attention, and impulsivity constraints [0].
Their conclusion -- that we need to defend against adversarial perturbations -- is obvious, I don't see anyone taking the opposite view, and I don't see how this really moves the needle. If you can MITM the chat there's a lot of harm you can do.
This isn't like some major new attack. Science.org covered it along with peacocks being lasers because it's it's lightweight fun stuff for their daily roundup. People like talking about cats on the internet.
[0] for example, this blog post https://statmedlearning.com/navigating-adhd-and-test-taking-...