Hacker News new | ask | show | jobs
by thethirdone 394 days ago
Based on the data in table 3, I would attribute most of the difference to length of advice. LLMs average word count (29.4) is more than double human word count (13.25). Most other measures do not have a significant ratio. "Difficult word count" would be the only other with a ratio higher than 2, but that is inherited from total word count.

I think it would be difficult to truly convince me to answer differently in a test with 14 words where 30 would have enough space to actually convey an argument.

I would be very interested to see the test rerun while limiting LLM response length or encouraging long responses from humans.

2 comments

If you think writing more words will be more persuasive, just... write more words?

The test already incentivises being persuasive! If writing more words would do that, and the incentivised human persuaders don't write more words and the LLMs do, then I think it's fair to say that LLMs are more persuasive than incentivised human persuaders.

Sure. I am not contesting that LLMs are more persuasive in this context. That basic result comes through very clearly in the paper. Its not as clear how relevant this is to other situations though. I think its quite likely that humans given the instruction to increase word count might outperform LLMs. People are very unlikely to have practiced the specific task of giving advice on multiple choice tests whereas LLMs have likely gotten RLHF training which likely helps in this situation.

I always try to pick out as many tidbits as possible from papers that might be applicable in other situations. I think the main difference of word count may be overshadowing other insights that may be more relevant to longer form argumentation.

> I would be very interested to see the test rerun while limiting LLM response length or encouraging long responses from humans.

I don’t know if that would have the effect you want. And if you’re more likely have hallucinations at lower word counts, that matters for those who are scrupulous, but many people trying to convince you of something believe the ends justify the means, and that honesty or correspondence to reality are not necessary, just nice to have.

Asking chatbots for short answers can increase hallucinations, study finds - https://news.ycombinator.com/item?id=43950684 - May 2025 (1 comment)

which is reporting on this post:

Good answers not necessarily factual answers: analysis of hallucination in LLMs - https://news.ycombinator.com/item?id=43950678 - May 2025 (1 comment)

I'm not sure what effect you think I want. The suggestion was just to increase the "interestingness" of the study. It seems to be like the main difference between LLM and human shown was length of response. Controlling for that variable and rerunning the experiment would help show other differences.

I do think its distinctly possible that LLMs will be much less convincing due to increased hallucinations at a low word count. I also think that may have less of an effect for dishonest suggestions. Simply stating a lie confidently is relatively effective.

I would prefer advising humans to increase length rather than restricting LLMs because of the cited effects.

> I would prefer advising humans to increase length rather than restricting LLMs because of the cited effects.

I would advise the opposite to humans, as your advice is playing to the strengths of AI/LLMs and away from the strengths of humans versus AI/LLMs.

Advising the opposite to humans does not make sense. 13 words is already tiny to convince someone. The choices I was thinking were restricting LLM word count and increasing human word count. The goal is specifically to make them more comparable.

The given study does not show any strength of humans over LLMs. Both goal metrics (truthful and deceptive) are better for LLMs than humans. If you are misinterpreting my advice as general advice for people not under the study's conditions, I would want to see the results of the proposed rerun before suggesting that.

However, if length of text is legitimately convincing regardless of content, I don't know why humans should avoid using that. If LLMs end up more convincing to humans than other humans simply because humans are too prideful to make their arguments longer, that seems like the worst possible future.

> If LLMs end up more convincing to humans than other humans simply because humans are too prideful to make their arguments longer, that seems like the worst possible future.

People aren’t too proud to make long arguments, they just take more time and effort to make for humans, and so historically, humans subconsciously consider longer arguments as more intellectually rigorous whether they are or not, and so length of a written piece is used as a kind of lazy heuristic corresponding with quality. When we're comparing the output of humans to that of other humans, this kind of approach may work to a certain extent, but AI/LLMs seem to be better at writing long pieces of text upon demand than humans. That humans find the LLM output more convincing if it is longer is not surprising to me, but I’ll agree with you that it isn’t a good sign either. The metric has become a target.