| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gavi 414 days ago
	too much thinking https://gist.github.com/gavi/b9985f730f5deefe49b6a28e5569d46...

1 comments

fzzzy 414 days ago

My impression from running the first R1 release locally was that it also does too much thinking.

link

reissbaker 413 days ago

Magistral Small seems wayyy too heavy-handed with its RL to me:

\boxed{Hey! How can I help you today?}

They clearly rewarded the \boxed{...} formatting during their RL training, since it makes it easier to naively extract answers to math problems and thus verify them. But Magistral uses it for pretty much everything, even when it's inappropriate (in my own testing as well).

It also forgets to <think> unless you use their special system prompt reminding it to.

Honestly a little disappointing. It obviously benchmarks well, but it seems a little overcooked on non-benchmark usage.

link

cluckindan 414 days ago

It does not do any thinking. It is a statistical model, just like the rest of them.

link

LordDragonfang 414 days ago

"Thinking" is a term of art referring to the hidden/internal output of "reasoning" models where they output "chain of thought" before giving an answer[1]. This technique and name stem from the early observation that LLMs do better when explicitly told to "think step by step"[2]. Hope that helps clarify things for you for future constructive discussion.

[1] https://arxiv.org/html/2410.10630v1

[2] https://arxiv.org/pdf/2205.11916

link

bobsomers 414 days ago

We are aware of the term of art.

The point that was trying to be made, which I agree with, is that anthropomorphizing a statistical model isn’t actually helpful. It only serves to confuse laypersons into assuming these models are capable of a lot more than they really are.

That’s perfect if you’re a salesperson trying to dump your bad AI startup onto the public with an IPO, but unhelpful for pretty much any other reason, especially true understanding of what’s going on.

link

LordDragonfang 414 days ago

If that was their point, it would have been more constructive to actually make it.

To your point, it's only anthropomorphization if you make the anthrocentric assumption that "thinking" refers to something that only humans can do.[1]

And I don't think it confuses laypeople, when literally telling it to "think" achieves the very similar results as in humans - it produces output that someone provided it out-of-context would easily identify as "thinking out loud", and improves the accuracy of results like how... thinking does.

The best mental model of RLHF'd LLMs that I've seen is that they are statistical models "simulating"[1] how a human-like character would respond to a given natural-language input. To calculate the statistically "most likely" answer that an intelligent creature would give to a non-trivial question, with any sort of accuracy, you need emergent effects which look an awful like like a (low fidelity) simulation of intelligence. This includes simulating "thought". (And the distinction between "simulating thinking" and "thinking" is a distinction without a difference given enough accuracy)

I'm curious as to what "capabilities" you think the layperson is misled about, because if anything they tend to exceed layperson understanding IME. And I'm curious what mental model you have of LLMs that provides more "true understanding" of how a statistical model can generate answers that appear nowhere in its training.

[1] It also begs the question of whether there exists a clear and narrow definition of what "thinking" is that everyone can agree on. I suspect if you ask five philosophers you'll get six different answers, as the saying goes.

[2] https://www.astralcodexten.com/p/janus-simulators

link

zer00eyz 414 days ago

> It also begs the question of whether there exists a clear and narrow definition of what "thinking" is that everyone can agree on. I suspect if you ask five philosophers you'll get six different answers, as the saying goes.

And yet we added a hand wavy 7th to humanize a peice of technology.

link

MindTheAbstract 413 days ago

I know this is the terminology, but I'd argue that the activations are the actual thinking. It's probably too late to change that, but I wish people would refer to thinking as the work Anthropic and Deepmind are doing with their mech interp

link

andrepd 413 days ago

It's a misleading "term of art" which is more accurately described as a "term of marketing". Reasoning is precisely what LLMs don't do and it's precisely why they are unsuited to many tasks they are peddled for.

link

LordDragonfang 413 days ago

How are you defining "reasoning" such that you are confident that LLMs are definitely not doing it? What evidence do you have to that effect? (And are you certain that none of your reasoning applies to humans as well?)

link

cluckindan 413 days ago

They don’t ”think”.

https://arxiv.org/abs/2503.09211

They don’t ”reason”.

https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...

They don’t even always output their internal state accurately.

https://arxiv.org/abs/2505.05410

link

boredhedgehog 413 days ago

These kind of comments are the equivalent of going to dog owners' forums, analyzing word choices in every post and warning the dog owners about the dangers of anthropomorphizing their pets, an effort as accurate as it is boorish and ineffectual.

link

cluckindan 413 days ago

Dogs will not be quite as widely influencing decisions concerning other people.

link

robmccoll 414 days ago

What are we doing when we think?

link

cluckindan 413 days ago

Human neurons are not reducible to arithmetic artificial neurons in a statistical model. Do not conflate them.

link

jeffhuys 413 days ago

Why not, actually?

link

cluckindan 413 days ago

Because we do not have a complete understanding of human neurons. How are we supposed to accurately model something we cannot directly observe?

link

otabdeveloper4 413 days ago

We don't know yet. But we do know it's certainly not statistical token prediction.

(People can do statistical token prediction too, but that's called "bullshitting", not "thinking". Thinking is a much wider class of activity.)

link

LordDragonfang 413 days ago

Do we know that with certainty? Do we actually?

Because my understanding is that how "thinking" works is actually still a total mystery. How is it we no for certain that the basis for the analog electric-potential-based computing done by neurons is not based on statistical prediction?

Do we have actual evidence of that, or are you just doing "statistical token prediction" yourself?

link

cluckindan 412 days ago

You’re reversing the burden of proof in a similar manner as religious people often do. Absence of evidence is not evidence of absence, and so on.

link