| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by metalcrow 402 days ago
	My guess for the reason behind this is that LLMs have more facts memorized, and thus can make more reasonable and well-researched sounding answers. If you ask an LLM vs a Human "Is a stack in computer science a) a data structure that is first in first out or b) a data structure that is first in last out" the LLM can say stuff resembling "Based on Dijkstra's algorithm proof given in 1943 and the nature of Turing complete languages being traditionally a top-down oriented system, a stack is ..." while a human is just going to say "It's B because that's what a stack is".

8 comments

CJefferson 402 days ago

Based on reading bad AI generated student essays it’s worse than that, LLMs are happy to “fill in the blanks” with whatever made up fact would make their argument look best.

Most people can’t lie that smoothly, and most readers don’t check carefully, unless they are already an expert in the area.

Any kind of maths proof is particularly bad, they will look convincing and clear until you read them very carefully and see all the holes.

link

smeej 402 days ago

It's funny you mention this, because my father operates exactly as you describe the LLMs, making facts up on the spot, lying smoothly and keeping track of the lies...

...and he's built his whole career in sales because of it.

He despises the existence of Google, because the last thing a pathological bullshitter wants is fact-checking in pockets!

It's taken me nearly 40 years to understand that anchoring statements in reality is just a completely meaningless endeavor for him. He does not care what is true. He cares only what is convincing.

I've been wondering for about a year now why I feel like I can tell LLM work from human work so much more easily than most people, when the only "tell" I can put my finger on is, "The hair stands up on the back of my neck," but this explains ALL of it.

link

Llamamoe 401 days ago

I feel like a good half of humanity operates this way, with it being far more prevalent in boomers than younger generations. It doesn't matter what is backed by evidence to them, instead they rely on anecdotes and persuasive quips and factoids. Having a friend who claims to have experienced X and listing off several other anecdotes means more to them than any amount of evidence.

link

smeej 401 days ago

The truly scary thing to me is watching them start to believe the anecdotes they've stolen from people and presented as their own stories actually did happen to them, as they lose their marbles.

I've spent much of my life learning to tell when people are making things up, but telling when they genuinely believe something that's completely wrong is a very different skill.

It's especially frustrating when they change the narrative of a real story about something where there were multiple witnesses (e.g., my mom and my siblings), then come to believe the narrative, and then accuse us of conspiring to gaslight them.

On the one hand, I get why that would be disorienting and scary, to have a whole group of people telling you you're wrong about something you're sure you remember. On the other hand...karma?

link

MattGaiser 402 days ago

In his line of work, it doesn't matter what is true.

link

AlexCoventry 402 days ago

It depends on the AI. ChatGPT's higher models (o1-pro/o3/o4-mini-high) have some kind of limited capability to detect errors in the user's thinking, and have relatively few hallucinations.

link

energy123 401 days ago

o3 have twice the hallucinations of o1 according to their own hallucination benchmark

link

UltraSane 401 days ago

I've had fun debates about things like p-zombies with Gemini 2.5 Pro

link

hammock 402 days ago

Reminds me of the horrific state of student debate competitions today where the winning strategy is to incomprehensibly rattle off as many arguments as quickly as possible with strange breathy sounds in between

link

upghost 402 days ago

This is a consequence of the fact that any argument not responded to "flows across" the score sheet and is automatically a win for the team making the argument, no matter how silly. So a "natural" tendency would be to ignore ridiculous arguments like "not paying for school lunches will cause children to hyperventilate, and by the butterfly effect will lead to infinite hurricanes in developing nations causing a collapse of the global economy and intergalatic war and genocide". But if the opposite team fails to acknowledge the argument then that is the same as conceding it will happen.

link

beeflet 402 days ago

Which is pretty ridiculous. The purpose of a debate should be to change/consolidate the hearts and minds of the audience to your side. To this end, it's usually sufficient to pick apart a few of the key points of your opponent's argument. Nitpicking every aspect of your opponent usually comes off as uncharismatic.

Brevity is really important in a debate. Especially in the modern day where someone might turn you into a chad vs soyjack meme.

And if anything, what happens before the debate is more important than what happens during it. Our dear president showed us you can become the leader of the free world using playground insults and ad-libbed speeches if you choose the right demographics and look good in a suit.

link

johnisgood 401 days ago

Debates these days (especially political ones) are just unnecessary, totally unrelated ad hominems, and people yelling over each other.

Yup to your last sentence. It irritated me how off-topic his responses were.

link

amenhotep 401 days ago

He looks awful in a suit!

link

thih9 401 days ago

I guess winning like this cheapens the victory. Then again, this strategy continues to be used at all levels of disputes and politics. I wish there was a way to stop that, not just in student debates.

link

azemetre 402 days ago

Do you have a YouTube video demonstrating this? My only experience with debate is from the TV show Community.

link

justonceokay 402 days ago

This one is very short but conveys the idea well. Not all debate is like this but it is definitely a real phenomenon

https://youtu.be/LMO27PAHjrY

link

smeej 402 days ago

I'm accustomed to listening to regular speech at 2-3x speed, but apparently that's entirely different than listening to a human try to speak 2-3x faster than normal, because I could barely pick intelligible syllables out of that mess.

This is such an example of getting what you incentivize, not what matters.

link

sebastiennight 394 days ago

This is hilarious and reminds me of when I was exactly that age, and learning to spit out Busta Rhymes's "Break Your Neck" [0] at full speed.

When Busta makes more intelligible listening than the arguments of your debate team, you know debate is broken.

[0]: Start 2 minutes in, give it a try: https://youtu.be/W7FfCJb8JZQ?feature=shared&t=120

link

cwmoore 402 days ago

A small step for a man, a giantleapfrogmankind.

link

timeforcomputer 402 days ago

"Because we raise the trigger and only two carrying noodles, and only two can announce in this network but their excess cites their examine this places where the apparatus of military power torches the ground"

He makes an intriguing point.

link

OJFord 402 days ago

Hamdiddle-eedah-hamdiddle-ah (do do do do dodododo expi-ali-do-cious)

What is the point of that? They're incomprehensible. (For those who haven't watched it: the video just shows people talking very fast, it doesn't explain why, kind of implies it's somehow good or impressive.)

link

nimih 402 days ago

The point is to win debate tournaments. In particular, it is (or at least was, when I competed in policy debate in high school and college in the 00s) strategically advantageous to maximize the number of distinct arguments, each with their own set of supporting evidence (usually read verbatim from a prepared excerpt of a news article or authoritative reference or whatever), you make within the allocated time. This incentivizes talking extremely quickly, which requires a fair bit of practice to become proficient at (and to understand).

link

OJFord 402 days ago

And the judges of these tournaments not only understand it too (I can understand an opponent understanding if they've practiced the same thing) but seriously value it in scoring?

Again/stepping back: what is the point of winning a debate tournament like this, or that values this 'debate'?

link

emseetech 402 days ago

https://en.wikipedia.org/wiki/Spreading_(debate)

link

1oooqooq 402 days ago

not even Idiocracy predicted that one.

link

AlexCoventry 402 days ago

These students are probably intellectually gifted, they're just playing a stupid game for the sake of an item on their resume.

link

namaria 401 days ago

I question the intellect of anyone engaging in silly games with the sole purpose of impressing other people.

link

jimbokun 402 days ago

What the fuck is wrong with the people running these debates that they reward these techniques?

link

xrhobo 401 days ago

It is quite strange. One would think a judge would easily throw this out.

I mean there is probably not a specific rule I could point to that a high school athlete couldn't ride a bike or a motorcycle in a 400m track run either.

There is probably not a specific rule that you can't shoot the shot put out of a canon either.

I would just assume the judges have the slightest bit of common sense.

link

sebastiennight 394 days ago

> I mean there is probably not a specific rule I could point to that a high school athlete couldn't ride a bike or a motorcycle in a 400m track run either.

There most definitely is such a rule, and there most definitely are people who have tried to do that - and been the cause of the original rule wording ; and others who still have tried to do so by "creatively interpreting" said rule.

Have you met humans?

link

api 402 days ago

That’s just like the larger discourse. The Gish gallop is standard practice.

Are there no rules in debates? There should be. You’re not allowed to punch someone in basketball so why should you be allowed to DOS people with bullshit in a debate?

link

Der_Einzige 402 days ago

Btw - my first author NeurIPS dataset and benchmarks paper is taking basically all the evidence that such debate community (American hs and college level policy and LD debate) produced over its recent history and making it easy for LLMs and people to consume it.

They’ve been quietly open sourcing all of their arguments for like 20+ years.

This dataset is so large and good entirely because of speed reading and the current state of debate tournament competitive dynamics. Spreading might be objectively absurd to listeners but the effects of it are literally good for society.

https://arxiv.org/abs/2406.14657

https://huggingface.co/datasets/Yusuf5/OpenCaselist

link

koakuma-chan 402 days ago

I asked an LLM and it said "A stack is a data structure that follows the Last In, First Out (LIFO) principle. This means that the last element added to the stack is the first element to be removed."

link

abtinf 402 days ago

It’s subtle but I would regard this as an incorrect answer.

The structure of the LLM answer is:

A is B; B exhibits property C.

The correct answer is:

A exhibits property C; B is the class of things with property C; therefore A is B.

There is a crucial difference between these two.

link

Matthyze 401 days ago

I think you've read too much early Wittgenstein. That is simply not how people communicate.

link

literalAardvark 402 days ago

This doesn't apply to all prompts, and the prompt was not provided. Natural language is a fickle thing.

link

moffkalast 402 days ago

This kind of pointless hair splitting is why people would rather talk to an LLM.

link

Benjammer 402 days ago

This kind of “hair splitting” is the foundation on current prompt engineering though…

link

hansmayer 402 days ago

Yikes:( I am so worried about the damage that will be caused by the misuse of these tools. Already a lot of young folks will just mindlessly trust whatever the magic oracle spits out at them. We need to go back to testing people with pen and paper I suppose.

link

Karrot_Kream 402 days ago

I read this and I see a common thinking fallacy, when someone is inclined to believe something a priori they fit the evidence to their a priori beliefs.

link

hansmayer 402 days ago

No, its fairly simple - I misread

link

jstanley 402 days ago

Why is that a bad answer?

link

hansmayer 402 days ago

Sorry - I misread the LLM answer - actually the LLM produced a correct answer here

link

lovasoa 402 days ago

No it is not: https://en.wikipedia.org/wiki/FIFO_%28computing_and_electron...

link

louthy 402 days ago

> No it is not…

That’s a queue, not a stack. The LLM response was correct.

link

danielbln 402 days ago

But a stack is commonly LIFO, not FIFO?!

link

koakuma-chan 402 days ago

I mean, is it wrong? It seems correct. Unless I'm missing something.

link

hansmayer 402 days ago

Oops, my bad. I seem to have misread. Sorry.

link

thinkcritical 402 days ago

No, a stack is LIFO like it said. A queue is FIFO or in other words LILO “Last In Last Out”.

link

koakuma-chan 402 days ago

My last job was at the office. I had my work queue implemented as a stack of files. I would sit at my desk and, in an infinite loop, pop files from my stack and process them. Occasionally, my supervisor would come and push a new file onto my stack. A naive worker would think that, once I was done with my stack, I could finally get some sleep, but no. Our office implemented something called "work stealing," where, once I was done with my own work, I had to visit a random co-worker and pop files from their stack.

link

lovasoa 402 days ago

No. The LLM's answer is correct.

link

rstuart4133 401 days ago

> My guess for the reason behind this is that LLMs have more facts memorized,

From https://ai.meta.com/research/cicero/ :

    When playing 40 games against human players, CICERO achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game

There are not a lot of facts to know when playing diplomacy. It's all about manipulating the other guy with words.

link

SoftTalker 402 days ago

I learned a stack is like a stack of plates in a cafeteria. That seems a better answer than either of those.

link

nickpsecurity 402 days ago

They also have more persuasive conversations in their pretraining data. That includes tons of marketing material, cons, and bullying. They are also as bold as you want them to be about imitating such tactics. They have no remorse or legal accountability either.

link

Sharlin 402 days ago

The gap between LLM and human cases was greater in the deceptive case. This may, of course, simply reflect the fact that random humans are bad at lying.

link

armchairhacker 402 days ago

LLMs also never get tired of arguing. They'll respond to every point from a gish-gallop and provide quality-sounding replies to points that are obviously (to an informed person) flawed or seem (but aren't necessarily) mal-intentioned.

EDIT: LLMs also aren't egocentric; they'll respond in the other person's style (grammar, tone, and perhaps maintain their "subtext" like assumptions), and they're less likely to omit important information that would be implicit to them but not the other person.

link

sitkack 401 days ago

Any qualities you ascribe to an LLM is part of its RLHF, ask to get irritated or lazy and it will simulate those qualities. They are high dimensional text simulators. They can and do simulate anything.

link