Hacker News new | ask | show | jobs
by cuttothechase 448 days ago
This is definitely a classic for story telling but it appears to be nothing more than hand wavy. Its a bit like there is the great and powerful man behind the curtain, lets trace the thought of this immaculate being you mere mortals. Anthropomorphing seems to be in an overdose mode with "thinking / thoughts", "mind" etc., scattered everywhere. Nothing with any of the LLMs outputs so far suggests that there is anything even close enough to a mind or a thought or anything really outside of vanity. Being wistful with good story telling does go a long way in the world of story telling but in actually understanding the science, I wouldn't hold my breath.
3 comments

Thanks for the feedback! I'm one of the authors.

I just wanted to make sure you noticed that this is linking to an accessible blog post that's trying to communicate a research result to a non-technical audience?

The actual research result is covered in two papers which you can find here:

- Methods paper: https://transformer-circuits.pub/2025/attribution-graphs/met...

- Paper applying this method to case studies in Claude 3.5 Haiku: https://transformer-circuits.pub/2025/attribution-graphs/bio...

These papers are jointly 150 pages and are quite technically dense, so it's very understandable that most commenters here are focusing on the non-technical blog post. But I just wanted to make sure that you were aware of the papers, given your feedback.

The post to which you replied states:

  Anthropomorphing[sic] seems to be in an overdose mode with 
  "thinking / thoughts", "mind" etc., scattered everywhere. 
  Nothing with any of the LLMs outputs so far suggests that 
  there is anything even close enough to a mind or a thought 
  or anything really outside of vanity.
This is supported by reasonable interpretation of the cited article.

Considering the two following statements made in the reply:

  I'm one of the authors.
And

  These papers are jointly 150 pages and are quite 
  technically dense, so it's very understandable that most 
  commenters here are focusing on the non-technical blog post.
The onus of clarifying the article's assertions:

  Knowing how models like Claude *think* ...
And

  Claude sometimes thinks in a conceptual space that is 
  shared between languages, suggesting it has a kind of 
  universal “language of thought.”
As it pertains to anthropomorphizing an algorithm (a.k.a. stating it "thinks") is on the author(s).
Thinking and thought have no solid definition. We can't say Claude doesn't "think" because we don't even know what a human thinking actually is.

Given the lack of a solid definition for thinking and test to measure it, I think using the terminology colloquially is a totally fair play.

I view LLM's as valuable algorithms capable of generating relevant text based on queries given to them.

> Thinking and thought have no solid definition. We can't say Claude doesn't "think" because we don't even know what a human thinking actually is.

I did not assert:

  Claude doesn't "think" ...
What I did assert was that the onus is on the author(s) which write articles/posts such as the one cited to support their assertion that their systems qualify as "thinking" (for any reasonable definition of same).

Short of author(s) doing so, there is little difference between unsupported claims of "LLM's thinking" and 19th century snake oil[0] salesmen.

0 - https://en.wikipedia.org/wiki/Snake_oil

No one says that a thermostat is "thinking" of turning on the furnace, or that a nightlight is "thinking it is dark enough to turn the light on". You are just being obtuse.
Yes. A thermostat involves a change of state from A to B. A computer is the same: its state at t causes its state at t+1, which causes its state at t+2, and so on. Nothing else is going on. An LLM is no different: an LLM is simply a computer that is going through particular states.

Thought is not the same as a change of (brain) state. Thought is certainly associated with change of state, but can't be reduced to it. If thought could be reduced to change of state, then the validity/correctness/truth of a thought could be judged with reference to its associated brain state. Since this is impossible (you don't judge whether someone is right about a math problem or an empirical question by referring to the state of his neurology at a given point in time), it follows that an LLM can't think.

>Thought is certainly associated with change of state, but can't be reduced to it.

You can effectively reduce continuously dynamic systems to discreet steps. Sure, you can always say that the "magic" exists between the arbitrarily small steps, but from a practical POV there is no difference.

A transistor has a binary on or off. A neuron might have ~infinite~ levels of activation.

But in reality the ~infinite~ activation level can be perfectly modeled (for all intents and purposes), and computers have been doing this for decades now (maybe not with neurons, but equivalent systems). It might seem like an obvious answer, that there is special magic in analog systems that binary machines cannot access, but that is wholly untrue. Science and engineering have been extremely successful interfacing with the analog reality we live in, precisely because the digital/analog barrier isn't too big of a deal. Digital systems can do math, and math is capable of modeling analog systems, no problem.

Please, take the pencil and draw the line between thinking and non-thinking systems. Hell I'll even take a line drawn between thinking and non-thinking organisms if you have some kind of bias towards sodium channel logic over silicon trace logic. Good luck.
Even if you can't define the exact point that A becomes not-A, it doesn't follow that there is no distinction between the two. Nor does it follow that we can't know the difference. That's a pretty classic fallacy.

For example, you can't name the exact time that day becomes night, but it doesn't follow that there is no distinction.

A bunch of transistors being switched on and off, no matter how many there are, is no more an example of thinking than a single thermostat being switched on and off. OTOH, if we can't think, then this conversation and everything you're saying and "thinking" is meaningless.

So even without a complete definition of thought, we can see that there is a distinction.

Your assertion that sodium channel logic and silicon trace logic are 100% identical is the primary problem. It's like claiming that a hydraulic cylinder and a bicep are 100% equivalent because they both lift things - they are not the same in any way.
Or submarines swim ;)
think about it more
Honestly, arguing seems futile when it comes to opinions like GP. Those opinions resemble religious zealotry to me in that they take for granted that only humans can think. Any determinism of any kind in a non-human is seized upon as proof its mere clockwork, yet they can’t explain how humans think in order to contrast it.
> Honestly, arguing seems futile when it comes to opinions like GP. Those opinions resemble religious zealotry to me in that they take for granted that only humans can think. Any determinism of any kind in a non-human is seized upon as proof its mere clockwork, yet they can’t explain how humans think in order to contrast it.

Putting aside the ad hominems, projections, and judgements, here is a question for you:

If I made a program where a NPC[0] used the A-star[1] algorithm to navigate a game map, including avoiding obstacles and using the shortest available path to reach its goal, along with identifying secondary goal(s) should there be no route to the primary goal, does that qualify to you as the NPC "thinking"?

0 - https://en.wikipedia.org/wiki/Non-player_character

1 - https://en.wikipedia.org/wiki/A*_search_algorithm

Answer: I suppose no? But my point is only this:

1. People with the "AI isn't thinking" opinions move the goalposts, the borderline between "just following a deterministic algorithm" and "thinking" wherever needed in order to be right.

2. I argue that the brain itself must either be deterministic (just wildly complex) or, for lack of a better word, supernatural. If it's not deterministic, only God knows how our thinking process works. Every single person postulating about whether AI is "thinking" cannot fully explain why a human chooses a particular action, just as AI researchers can't explain why Claude does a certain thing in all scenarios. Therefore they are much more similar than they are different.

3. But really, the important thing is, unless you're approaching this from a religious POV (which is arguably much more interesting) the obsessive sorting of highly complex and not-even-remotely-fully-understood processes into "thinking" and "NOT thinking" groups is pointless and silly.

Really appreciate your team's enormous efforts in this direction, not only the cutting edge research (which I don't see OAI/DeepMind publishing any paper on) but aslo making the content more digestible for non-research audience. Please keep up the great work!
I, uh, think, that "think" is a fine metaphor but "planning ahead" is a pretty confusing one. It doesn't have the capability to plan ahead because there is nowhere to put a plan and no memory after the token output, assuming the usual model architecture.

That's like saying a computer program has planned ahead if it's at the start of a function and there's more of the function left to execute.

I think that's a very unfair take. As a summary for non-experts I found it did a great job of explaining how by analyzing activated features in the model, you can get an idea of what it's doing to produce the answer. And also how by intervening to change these activations manually you can test hypotheses about causality.

It sounds like you don't like anthropomorphism. I can relate, but I don't get where Its a bit like there is the great and powerful man behind the curtain, lets trace the thought of this immaculate being you mere mortals is coming from. In most cases the anthropomorphisms are just the standard way to convey the idea briefly. Even then I liked how they sometimes used scare quotes as in it began "thinking" of potential on-topic words. There are some more debatable anthropomorphisms such as "in its head" where they use scare quotes systematically.

Also given that they took inspiration from neuroscience to develop a technique that appears successful in analyzing their model, I think they deserve some leeway on the anthropomorphism front. Or at least on the "biological metaphors" front which is maybe not really the same thing.

I used to think biological metaphors for LLMs were misleading, but I'm actually revising this opinion now. I mean I still think the past metaphors I've seen were misleading, but here, seeing the activation pathways they were able to identify, including the inhibitory circuits, and knowing a bit about similar structures in the brain I find the metaphor appropriate.

Yup... well, if the research is conducted (or sponsored) by the company that develops and sells the LLM, of course there will be a temptation to present their product in a better light and make it sound like more than it actually is. I mean, the anthropomorphization starts already with the company name and giving the company's LLM a human name...