Hacker News new | ask | show | jobs
by gavi 367 days ago
too much thinking

https://gist.github.com/gavi/b9985f730f5deefe49b6a28e5569d46...

1 comments

My impression from running the first R1 release locally was that it also does too much thinking.
Magistral Small seems wayyy too heavy-handed with its RL to me:

\boxed{Hey! How can I help you today?}

They clearly rewarded the \boxed{...} formatting during their RL training, since it makes it easier to naively extract answers to math problems and thus verify them. But Magistral uses it for pretty much everything, even when it's inappropriate (in my own testing as well).

It also forgets to <think> unless you use their special system prompt reminding it to.

Honestly a little disappointing. It obviously benchmarks well, but it seems a little overcooked on non-benchmark usage.

It does not do any thinking. It is a statistical model, just like the rest of them.
"Thinking" is a term of art referring to the hidden/internal output of "reasoning" models where they output "chain of thought" before giving an answer[1]. This technique and name stem from the early observation that LLMs do better when explicitly told to "think step by step"[2]. Hope that helps clarify things for you for future constructive discussion.

[1] https://arxiv.org/html/2410.10630v1

[2] https://arxiv.org/pdf/2205.11916

We are aware of the term of art.

The point that was trying to be made, which I agree with, is that anthropomorphizing a statistical model isn’t actually helpful. It only serves to confuse laypersons into assuming these models are capable of a lot more than they really are.

That’s perfect if you’re a salesperson trying to dump your bad AI startup onto the public with an IPO, but unhelpful for pretty much any other reason, especially true understanding of what’s going on.

If that was their point, it would have been more constructive to actually make it.

To your point, it's only anthropomorphization if you make the anthrocentric assumption that "thinking" refers to something that only humans can do.[1]

And I don't think it confuses laypeople, when literally telling it to "think" achieves the very similar results as in humans - it produces output that someone provided it out-of-context would easily identify as "thinking out loud", and improves the accuracy of results like how... thinking does.

The best mental model of RLHF'd LLMs that I've seen is that they are statistical models "simulating"[1] how a human-like character would respond to a given natural-language input. To calculate the statistically "most likely" answer that an intelligent creature would give to a non-trivial question, with any sort of accuracy, you need emergent effects which look an awful like like a (low fidelity) simulation of intelligence. This includes simulating "thought". (And the distinction between "simulating thinking" and "thinking" is a distinction without a difference given enough accuracy)

I'm curious as to what "capabilities" you think the layperson is misled about, because if anything they tend to exceed layperson understanding IME. And I'm curious what mental model you have of LLMs that provides more "true understanding" of how a statistical model can generate answers that appear nowhere in its training.

[1] It also begs the question of whether there exists a clear and narrow definition of what "thinking" is that everyone can agree on. I suspect if you ask five philosophers you'll get six different answers, as the saying goes.

[2] https://www.astralcodexten.com/p/janus-simulators

> It also begs the question of whether there exists a clear and narrow definition of what "thinking" is that everyone can agree on. I suspect if you ask five philosophers you'll get six different answers, as the saying goes.

And yet we added a hand wavy 7th to humanize a peice of technology.

I know this is the terminology, but I'd argue that the activations are the actual thinking. It's probably too late to change that, but I wish people would refer to thinking as the work Anthropic and Deepmind are doing with their mech interp
It's a misleading "term of art" which is more accurately described as a "term of marketing". Reasoning is precisely what LLMs don't do and it's precisely why they are unsuited to many tasks they are peddled for.
How are you defining "reasoning" such that you are confident that LLMs are definitely not doing it? What evidence do you have to that effect? (And are you certain that none of your reasoning applies to humans as well?)
They don’t ”think”.

https://arxiv.org/abs/2503.09211

They don’t ”reason”.

https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...

They don’t even always output their internal state accurately.

https://arxiv.org/abs/2505.05410

These kind of comments are the equivalent of going to dog owners' forums, analyzing word choices in every post and warning the dog owners about the dangers of anthropomorphizing their pets, an effort as accurate as it is boorish and ineffectual.
Dogs will not be quite as widely influencing decisions concerning other people.
What are we doing when we think?
Human neurons are not reducible to arithmetic artificial neurons in a statistical model. Do not conflate them.
Why not, actually?
Because we do not have a complete understanding of human neurons. How are we supposed to accurately model something we cannot directly observe?
We don't know yet. But we do know it's certainly not statistical token prediction.

(People can do statistical token prediction too, but that's called "bullshitting", not "thinking". Thinking is a much wider class of activity.)

Do we know that with certainty? Do we actually?

Because my understanding is that how "thinking" works is actually still a total mystery. How is it we no for certain that the basis for the analog electric-potential-based computing done by neurons is not based on statistical prediction?

Do we have actual evidence of that, or are you just doing "statistical token prediction" yourself?

You’re reversing the burden of proof in a similar manner as religious people often do. Absence of evidence is not evidence of absence, and so on.