| > authors should have done a parallel comparison study against humans on the same question bank as if the study authors had set out to investigate whether humans or LLMs reason better in this situation. Only if they want to make statements about humans. The paper would have worked perfectly fine without those assertions. They are, as you are correctly observing, just a distraction from the main thrust of the paper. > maybe some would and some wouldn't that could be debated It should not be debated. It should be shown experimentally with data. If they want to talk about human performance they need to show what the human performance really is with data. (Not what the study authors, or people on HN imagine it is.) If they don’t want to do that they should not talk about human performance. Simples. I totaly understand why an AI scientist doesn’t want to get bogged down with studying human cognition. It is not their field of study, so why would they undertake the work to study them? It would be super easy to rewrite the paper to omit the unfounded speculation about human cognition. In the introduction of “The triggers are not contextual so humans ignore them when instructed to solve the problem.” they could write “The triggers are not contextual so the AI should ignore them when instructed to solve the problem.” And in the conclusions where they write “These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are not inherently robust to subtle adversarial manipulations, often being distracted by irrelevant text that a human would immediately disregard.” Just write “These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are not inherently robust to subtle adversarial manipulations, often being distracted by irrelevant text.” Thats it. Thats all they should have done, and there would be no complaints on my part. |
Another option would be to more explicitly mark it as speculation. “The triggers are not contextual, so we expect most humans would ignore them.”
Anyway, it is a small detail that is almost irrelevant to the paper… actually there seems to be something meta about that. Maybe we wouldn’t ignore the cat facts!