Hacker News new | ask | show | jobs
by metalcrow 402 days ago
My guess for the reason behind this is that LLMs have more facts memorized, and thus can make more reasonable and well-researched sounding answers. If you ask an LLM vs a Human "Is a stack in computer science a) a data structure that is first in first out or b) a data structure that is first in last out" the LLM can say stuff resembling "Based on Dijkstra's algorithm proof given in 1943 and the nature of Turing complete languages being traditionally a top-down oriented system, a stack is ..." while a human is just going to say "It's B because that's what a stack is".
8 comments

Based on reading bad AI generated student essays it’s worse than that, LLMs are happy to “fill in the blanks” with whatever made up fact would make their argument look best.

Most people can’t lie that smoothly, and most readers don’t check carefully, unless they are already an expert in the area.

Any kind of maths proof is particularly bad, they will look convincing and clear until you read them very carefully and see all the holes.

It's funny you mention this, because my father operates exactly as you describe the LLMs, making facts up on the spot, lying smoothly and keeping track of the lies...

...and he's built his whole career in sales because of it.

He despises the existence of Google, because the last thing a pathological bullshitter wants is fact-checking in pockets!

It's taken me nearly 40 years to understand that anchoring statements in reality is just a completely meaningless endeavor for him. He does not care what is true. He cares only what is convincing.

I've been wondering for about a year now why I feel like I can tell LLM work from human work so much more easily than most people, when the only "tell" I can put my finger on is, "The hair stands up on the back of my neck," but this explains ALL of it.

I feel like a good half of humanity operates this way, with it being far more prevalent in boomers than younger generations. It doesn't matter what is backed by evidence to them, instead they rely on anecdotes and persuasive quips and factoids. Having a friend who claims to have experienced X and listing off several other anecdotes means more to them than any amount of evidence.
The truly scary thing to me is watching them start to believe the anecdotes they've stolen from people and presented as their own stories actually did happen to them, as they lose their marbles.

I've spent much of my life learning to tell when people are making things up, but telling when they genuinely believe something that's completely wrong is a very different skill.

It's especially frustrating when they change the narrative of a real story about something where there were multiple witnesses (e.g., my mom and my siblings), then come to believe the narrative, and then accuse us of conspiring to gaslight them.

On the one hand, I get why that would be disorienting and scary, to have a whole group of people telling you you're wrong about something you're sure you remember. On the other hand...karma?

In his line of work, it doesn't matter what is true.
It depends on the AI. ChatGPT's higher models (o1-pro/o3/o4-mini-high) have some kind of limited capability to detect errors in the user's thinking, and have relatively few hallucinations.
o3 have twice the hallucinations of o1 according to their own hallucination benchmark
I've had fun debates about things like p-zombies with Gemini 2.5 Pro
Reminds me of the horrific state of student debate competitions today where the winning strategy is to incomprehensibly rattle off as many arguments as quickly as possible with strange breathy sounds in between
This is a consequence of the fact that any argument not responded to "flows across" the score sheet and is automatically a win for the team making the argument, no matter how silly. So a "natural" tendency would be to ignore ridiculous arguments like "not paying for school lunches will cause children to hyperventilate, and by the butterfly effect will lead to infinite hurricanes in developing nations causing a collapse of the global economy and intergalatic war and genocide". But if the opposite team fails to acknowledge the argument then that is the same as conceding it will happen.
Which is pretty ridiculous. The purpose of a debate should be to change/consolidate the hearts and minds of the audience to your side. To this end, it's usually sufficient to pick apart a few of the key points of your opponent's argument. Nitpicking every aspect of your opponent usually comes off as uncharismatic.

Brevity is really important in a debate. Especially in the modern day where someone might turn you into a chad vs soyjack meme.

And if anything, what happens before the debate is more important than what happens during it. Our dear president showed us you can become the leader of the free world using playground insults and ad-libbed speeches if you choose the right demographics and look good in a suit.

Debates these days (especially political ones) are just unnecessary, totally unrelated ad hominems, and people yelling over each other.

Yup to your last sentence. It irritated me how off-topic his responses were.

He looks awful in a suit!
I guess winning like this cheapens the victory. Then again, this strategy continues to be used at all levels of disputes and politics. I wish there was a way to stop that, not just in student debates.
Do you have a YouTube video demonstrating this? My only experience with debate is from the TV show Community.
This one is very short but conveys the idea well. Not all debate is like this but it is definitely a real phenomenon

https://youtu.be/LMO27PAHjrY

I'm accustomed to listening to regular speech at 2-3x speed, but apparently that's entirely different than listening to a human try to speak 2-3x faster than normal, because I could barely pick intelligible syllables out of that mess.

This is such an example of getting what you incentivize, not what matters.

This is hilarious and reminds me of when I was exactly that age, and learning to spit out Busta Rhymes's "Break Your Neck" [0] at full speed.

When Busta makes more intelligible listening than the arguments of your debate team, you know debate is broken.

[0]: Start 2 minutes in, give it a try: https://youtu.be/W7FfCJb8JZQ?feature=shared&t=120

A small step for a man, a giantleapfrogmankind.
"Because we raise the trigger and only two carrying noodles, and only two can announce in this network but their excess cites their examine this places where the apparatus of military power torches the ground"

He makes an intriguing point.

Hamdiddle-eedah-hamdiddle-ah (do do do do dodododo expi-ali-do-cious)

What is the point of that? They're incomprehensible. (For those who haven't watched it: the video just shows people talking very fast, it doesn't explain why, kind of implies it's somehow good or impressive.)

The point is to win debate tournaments. In particular, it is (or at least was, when I competed in policy debate in high school and college in the 00s) strategically advantageous to maximize the number of distinct arguments, each with their own set of supporting evidence (usually read verbatim from a prepared excerpt of a news article or authoritative reference or whatever), you make within the allocated time. This incentivizes talking extremely quickly, which requires a fair bit of practice to become proficient at (and to understand).
And the judges of these tournaments not only understand it too (I can understand an opponent understanding if they've practiced the same thing) but seriously value it in scoring?

Again/stepping back: what is the point of winning a debate tournament like this, or that values this 'debate'?

not even Idiocracy predicted that one.
These students are probably intellectually gifted, they're just playing a stupid game for the sake of an item on their resume.
I question the intellect of anyone engaging in silly games with the sole purpose of impressing other people.
What the fuck is wrong with the people running these debates that they reward these techniques?
It is quite strange. One would think a judge would easily throw this out.

I mean there is probably not a specific rule I could point to that a high school athlete couldn't ride a bike or a motorcycle in a 400m track run either.

There is probably not a specific rule that you can't shoot the shot put out of a canon either.

I would just assume the judges have the slightest bit of common sense.

> I mean there is probably not a specific rule I could point to that a high school athlete couldn't ride a bike or a motorcycle in a 400m track run either.

There most definitely is such a rule, and there most definitely are people who have tried to do that - and been the cause of the original rule wording ; and others who still have tried to do so by "creatively interpreting" said rule.

Have you met humans?

That’s just like the larger discourse. The Gish gallop is standard practice.

Are there no rules in debates? There should be. You’re not allowed to punch someone in basketball so why should you be allowed to DOS people with bullshit in a debate?

Btw - my first author NeurIPS dataset and benchmarks paper is taking basically all the evidence that such debate community (American hs and college level policy and LD debate) produced over its recent history and making it easy for LLMs and people to consume it.

They’ve been quietly open sourcing all of their arguments for like 20+ years.

This dataset is so large and good entirely because of speed reading and the current state of debate tournament competitive dynamics. Spreading might be objectively absurd to listeners but the effects of it are literally good for society.

https://arxiv.org/abs/2406.14657

https://huggingface.co/datasets/Yusuf5/OpenCaselist

I asked an LLM and it said "A stack is a data structure that follows the Last In, First Out (LIFO) principle. This means that the last element added to the stack is the first element to be removed."
It’s subtle but I would regard this as an incorrect answer.

The structure of the LLM answer is:

A is B; B exhibits property C.

The correct answer is:

A exhibits property C; B is the class of things with property C; therefore A is B.

There is a crucial difference between these two.

I think you've read too much early Wittgenstein. That is simply not how people communicate.
This doesn't apply to all prompts, and the prompt was not provided. Natural language is a fickle thing.
This kind of pointless hair splitting is why people would rather talk to an LLM.
This kind of “hair splitting” is the foundation on current prompt engineering though…
Yikes:( I am so worried about the damage that will be caused by the misuse of these tools. Already a lot of young folks will just mindlessly trust whatever the magic oracle spits out at them. We need to go back to testing people with pen and paper I suppose.
I read this and I see a common thinking fallacy, when someone is inclined to believe something a priori they fit the evidence to their a priori beliefs.
No, its fairly simple - I misread
Why is that a bad answer?
Sorry - I misread the LLM answer - actually the LLM produced a correct answer here
> No it is not…

That’s a queue, not a stack. The LLM response was correct.

But a stack is commonly LIFO, not FIFO?!
I mean, is it wrong? It seems correct. Unless I'm missing something.
Oops, my bad. I seem to have misread. Sorry.
No, a stack is LIFO like it said. A queue is FIFO or in other words LILO “Last In Last Out”.
My last job was at the office. I had my work queue implemented as a stack of files. I would sit at my desk and, in an infinite loop, pop files from my stack and process them. Occasionally, my supervisor would come and push a new file onto my stack. A naive worker would think that, once I was done with my stack, I could finally get some sleep, but no. Our office implemented something called "work stealing," where, once I was done with my own work, I had to visit a random co-worker and pop files from their stack.
No. The LLM's answer is correct.
> My guess for the reason behind this is that LLMs have more facts memorized,

From https://ai.meta.com/research/cicero/ :

    When playing 40 games against human players, CICERO achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game
There are not a lot of facts to know when playing diplomacy. It's all about manipulating the other guy with words.
I learned a stack is like a stack of plates in a cafeteria. That seems a better answer than either of those.
They also have more persuasive conversations in their pretraining data. That includes tons of marketing material, cons, and bullying. They are also as bold as you want them to be about imitating such tactics. They have no remorse or legal accountability either.
The gap between LLM and human cases was greater in the deceptive case. This may, of course, simply reflect the fact that random humans are bad at lying.
LLMs also never get tired of arguing. They'll respond to every point from a gish-gallop and provide quality-sounding replies to points that are obviously (to an informed person) flawed or seem (but aren't necessarily) mal-intentioned.

EDIT: LLMs also aren't egocentric; they'll respond in the other person's style (grammar, tone, and perhaps maintain their "subtext" like assumptions), and they're less likely to omit important information that would be implicit to them but not the other person.

Any qualities you ascribe to an LLM is part of its RLHF, ask to get irritated or lazy and it will simulate those qualities. They are high dimensional text simulators. They can and do simulate anything.