| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by threethirtytwo 208 days ago

What you’re describing is just competent engineering, and it’s already been applied to LLMs. People have been adversarial. That’s why we know so much about hallucinations, jailbreaks, distribution shift failures, and long-horizon breakdowns in the first place. If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

The key point you’re missing is the type of failure. Search systems fail by not retrieving. Parrots fail by repeating. LLMs fail by producing internally coherent but factually wrong world models. That failure mode only exists if the system is actually modeling and reasoning, imperfectly. You don’t get that behavior from lookup or regurgitation.

This shows up concretely in how errors scale. Ambiguity and multi-step inference increase hallucinations. Scaffolding, tools, and verification loops reduce them. Step-by-step reasoning helps. Grounding helps. None of that makes sense for a glorified Google search.

Hallucinations are a real weakness, but they’re not evidence of absence of capability. They’re evidence of an incomplete reasoning system operating without sufficient constraints. Engineers don’t dismiss CNC machines because they crash bits. They map the envelope and design around it. That’s what’s happening here.

Being skeptical of reliability in specific use cases is reasonable. Concluding from those failure modes that this is just Clever Hans is not adversarial engineering. It’s stopping one layer too early.

1 comments

habinero 208 days ago

> If this were hobbyist awe, none of those benchmarks or red-teaming efforts would exist.

Absolutely not true. I cannot express how strongly this is not true, haha. The tech is neat, and plenty of real computer scientists work on it. That doesn't mean it's not wildly misunderstood by others.

> Concluding from those failure modes that this is just Clever Hans is not adversarial engineering.

I feel like you're maybe misunderstanding what I mean when I refer to Clever Hans. The Clever Hans story is not about the horse. It's about the people.

A lot of people -- including his owner-- were legitimately convinced that a horse could do math, because look, literally anyone can ask the horse questions and it answers them correctly. What more proof do you need? It's obvious he can do math.

Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.

The relevant lesson here is it's very easy to convince yourself you saw something you 100% did not see. (It's why magic shows are fun.)

link

CamperBob2 208 days ago

Except of course it's not true lol. Horses are smart critters, but they absolutely cannot do arithmetic no matter how much you train them.

These things are not horses. How can anyone choose to remain so ignorant in the face of irrefutable evidence that they're wrong?

https://arxiv.org/abs/2507.15855

It's as if a disease like COVID swept through the population, and every human's IQ dropped 10 to 15 points while our machines grew smarter to an even larger degree.

link

habinero 207 days ago

Or -- and hear me out -- that result doesn't mean what you think it does.

That's the exact reason I mention the Clever Hans story. You think it's obvious because you can't come up with any other explanation, therefore there can't be another explanation and the horse must be able to do math. And if I can't come up with an explanation, well that just proves it, right? Those are the only two options, obviously.

Except no, all it means is you're the limiting factor. This isn't science 101 but maybe science 201?

My current hypothesis is the IMO thing gets trotted out mostly by people who aren't strong at math. They find the math inexplicable, therefore it's impressive, therefore machine thinky.

When you actually look hard at what's claimed in these papers -- and I've done this for a number of these self-published things -- the evidence frequently does not support the conclusions. Have you actually read the paper, or are you just waving it around?

At any rate, I'm not shocked that an LLM can cobble together what looks like a reasonable proof for some things sometimes, especially for the IMO which is not novel math and has a range of question difficulties. Proofs are pretty code-like and math itself is just a language for concisely expressing ideas.

Here, let me call a shot -- I bet this paper says LLMs fuck up on proofs like they fuck up on code. It will sometimes generate things that are fine, but it'll frequently generate things that are just irrational garbage.

link

CamperBob2 207 days ago

Have you actually read the paper, or are you just waving it around?

I've spent a lot of time feeding similar problems to various models to understand what they can and cannot do well at various stages of development. Reading papers is great, but by the time a paper comes out in this field, it's often obsolete. Witness how much mileage the ludds still get out of the METR study, which was conducted with a now-ancient Claude 3.x model that wasn't at the top of the field when it was new.

And the goalposts have now been moved to a dark corner of the parking garage down the street from the stadium. "This brand-new technology doesn't deliver infallible, godlike results out of the box, so it must just be fooling people." Or in equestrian parlance, "This talking horse told me to short NVDA. What a scam."

link

threethirtytwo 207 days ago

On the IMO paper: pointing out that it’s not a gold medal or that some proofs are flawed is irrelevant to the claim being discussed, and you know it. The claim is not “LLMs are perfect mathematicians.” The claim is that they can produce nontrivial formal reasoning that passes external verification at a rate far above chance and far above parroting. Even a single verified solution falsifies the “just regurgitation” hypothesis, because no retrieval-only or surface-pattern system can reliably construct valid proofs under novel compositions.

Your fallback move here is rhetorical, not scientific: “maybe it doesn’t mean what you think it means.” Fine. Then name the mechanism. What specific process produces internally consistent multi-step proofs, respects formal constraints, generalizes across problem types, and fails in ways analogous to human reasoning errors, without representing the underlying structure? “People are impressed because they’re bad at math” is not a mechanism, it’s a tell.

Also, the “math is just a language” line cuts the wrong way. Yes, math is symbolic and code-like. That’s precisely why it’s such a strong test. Code-like domains have exact semantics. They are adversarial to bullshit. That’s why hallucinations show up so clearly there. The fact that LLMs sometimes succeed and sometimes fail is evidence of partial competence, not illusion. A parrot does not occasionally write correct code or proofs under distribution shift. It never does.

You keep asserting that others are being fooled, but you haven’t produced what science actually requires: an alternative explanation that accounts for the full observed behavior and survives tighter controls. Clever Hans had one. Stage magic has one. LLMs, so far, do not.

Skepticism is healthy. But repeating “you’re the limiting factor” while refusing to specify a falsifiable counter-hypothesis is not adversarial engineering. It’s just armchair disbelief dressed up as rigor. And engineers, as you surely know, eventually have to ship something more concrete than that.

link

habinero 207 days ago

(Continuing from my other post)

The first thing I checked was "how did they verify the proofs were correct" and the answer was they got other AI people to check it, and those people said there were serious problems with the paper's methodology and it would not be a gold medal.

https://x.com/j_dekoninck/status/1947587647616004583

This is why we do not take things at face value.

link

CamperBob2 207 days ago

That tweet is aimed at Google. I don't know much about Google's effort at IMO, but OpenAI was the primary newsmaker in that event, and they reportedly did not use hints or external tools. If you have info to the contrary, please share it so I can update that particular belief.

Gemini 2.5 has since been superceded by 3.0, which is less likely to need hints. 2.5 was not as strong as the contemporary GPT model, but 3.0 with Pro Thinking mode enabled is up there with the best.

Finally, saying, "Well, they were given some hints" is like me saying, "LOL, big deal, I could drag a Tour peleton up Col du Galibier if I were on the same drugs Lance was using."

No, in fact I could do no such thing, drugs or no drugs. Similarly, a model that can't legitimately reason will not be able to solve these types of problems, even if given hints.

link

threethirtytwo 207 days ago

You’re leaning very hard on the Clever Hans story, but you’re still missing why the analogy fails in a way that should matter to an engineer.

Clever Hans was exposed because the effect disappeared under controlled conditions. Blind the observers, remove human cues, and the behavior vanished. The entire lesson of Clever Hans is not “people can fool themselves,” it’s “remove the hidden channel and see if the effect survives.” That test is exactly what has been done here, repeatedly.

LLM capability does not disappear when you remove human feedback. It does not disappear under automatic evaluation. It does not disappear across domains, prompts, or tasks the model was never trained or rewarded on. In fact, many of the strongest demonstrations people point to are ones where no human is in the loop at all: program synthesis benchmarks, math solvers, code execution tasks, multi-step planning with tool APIs, compiler error fixing, protocol following. These are not magic tricks performed for an audience. They are mechanically checkable outcomes.

Your framing quietly swaps “some people misunderstand the tech” for “therefore the tech itself is misunderstood in kind.” That’s a rhetorical move, not an argument. Yes, lots of people are confused. That has no bearing on whether the system internally models structure or just parrots. The horse didn’t suddenly keep solving arithmetic when the cues were removed. These systems do.

The “it’s about the people” point also cuts the wrong way. In Clever Hans, experts were convinced until adversarial controls were applied. With LLMs, the more adversarial the evaluation gets, the clearer the internal structure becomes. The failure modes sharpen. You start seeing confidence calibration errors, missing constraints, reasoning depth limits, and brittleness under distribution shift. Those are not illusions created by observers. They’re properties of the system under stress.

You’re also glossing over a key asymmetry. Hans never generalized. He didn’t get better at new tasks with minor scaffolding. He didn’t improve when the problem was decomposed. He didn’t degrade gracefully as difficulty increased. LLMs do all of these things, and in ways that correlate with architectural changes and training regimes. That’s not how self-deception looks. That’s how systems with internal representations behave.

I’ll be blunt but polite here: invoking Clever Hans at this stage is not adversarial rigor, it’s a reflex. It’s what you reach for when something feels too capable to be comfortable but you don’t have a concrete failure mechanism to point at. Engineers don’t stop at “people can be fooled.” They ask “what happens when I remove the channel that could be doing the fooling?” That experiment has already been run.

If your claim is “LLMs are unreliable for certain classes of problems,” that’s true and boring. If your claim is “this is all an illusion caused by human pattern-matching,” then you need to explain why the illusion survives automated checks, blind evaluation, distribution shift, and tool-mediated execution. Until then, the Hans analogy isn’t skeptical. It’s nostalgic.

link