| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wppick 237 days ago
	> It has come as a shock to some AI researchers that a large neural net that predicts next words seems to produce a system with general intelligence When I write prompts, I've stopped thinking of LLMs as just predicting a next word, and instead to think that they are a logical model built up by combining the logic of all the text they've seen. I think of the LLM as knowing that cats don't lay eggs, and when I ask it to finish the sentence "cats lay ..." It won't generate the word eggs even though eggs probably comes after lay frequently

2 comments

godelski 237 days ago

  > It won't generate the word eggs even though eggs probably comes after lay frequently

Even a simple N-gram model won't predict "eggs". You're misunderstanding by oversimplifying.

Next token prediction is still context based. It does not depend on only the previous token, but on the previous (N-1) tokens. You have "cat" so you should get words like "down" instead of "eggs" with even a 3-gram (trigram) model.

link

devmor 237 days ago

No, your original understanding was the more correct one. There is absolutely zero logic to be found inside an LLM, other than coincidentally.

What you are seeing is a semi-randomized prediction engine. It does not "know" things, it only shows you an approximation of what a completion of its system prompt and your prompt combined would look like, when extrapolated from its training corpus.

What you've mistaken for a "logical model" is simply a large amount of repeated information. To show the difference between this and logic, you need only look at something like the "seahorse emoji" case.

link

Philpax 237 days ago

No, their revised understanding is more accurate. The model has internal representations of concepts; the seahorse emoji fails because it uses those representations and stumbles: https://vgel.me/posts/seahorse/

link

aeternum 237 days ago

Word2vec can/could also do the seahorse thing. It at least seems like there's more to what humans consider a concept than a direction in a vector space model (but maybe not).

https://www.analyticsvidhya.com/blog/2021/07/word2vec-for-wo...

link

devmor 235 days ago

No, this is not a demonstration of logic or knowledge. It is a demonstration of a relational database.

A markov chain presents the same representation in a lower vector space.

link

nearbuy 237 days ago

If anything, the seahorse emoji case is exactly the type of thing you wouldn't expect to happen if LLMs just repeated information from their training corpus. It starts producing a weird dialogue that's completely unlike its training corpus, while trying to produce an emoji it's never seen during training. Why would it try to write an emoji that's not in its training data? This is totally different than its normal response when asked to produce a non-existent emoji. Normally, it just tells you the emoji doesn't exist.

So what is it repeating?

It's not enough to just point to an instance of LLMs producing weird or dumb output. You need to show how it fits with your theory that they "just repeating information". This is like pointing out one of the millions of times a person has said something weird, dumb, or nonsensical and claiming it proves humans can't think and can only repeat information.

link

devmor 235 days ago

> It starts producing a weird dialogue that's completely unlike its training corpus

But it's not doing that. It's just replacing a relation in vector space with one that we would think is distant.

Of course you would view an LLM's behavior as mystifying and indicative of something deeper when you do not know what it is doing. You should seek to understand something before assigning mysterious capabilities to it.

link

nearbuy 235 days ago

You're not addressing the objection. What is it about your model of how you think LLMs work (that it's just "repeated information") that predicts they'd go haywire when asked about a seahorse emoji (and only the seahorse emoji)? Why does your model explain this better than the standard academic view of deep neural nets?

You just pointed out an example of LLMs screwing up and then skipped right to "therefore they're just repeating information" without showing this is what your explanation predicts.

link

devmor 234 days ago

If you want to have a conversation with me, please stop creating fake quotes and assigning them to me, and please stop lying.

link

nearbuy 234 days ago

"repeated information" was copied verbatim from your comment. Your full sentence was:

> What you've mistaken for a "logical model" is simply a large amount of repeated information.

link

krackers 237 days ago

> There is absolutely zero logic to be found inside an LLM

Surely trained neural networks could never develop circuits that implement actual logic via computational graphs...

https://transformer-circuits.pub/2025/attribution-graphs/met...

link

godelski 237 days ago

You're both using two different definitions of the word "logic". Both are correct usages, but have different contexts.

link

diamond559 237 days ago

Brute force engineering solutions to appear like the computer is thinking. When we have no idea how we think ourselves. This will never generate true intelligence. It executes code, then it stops, it is a tool, nothing more.

link

typon 237 days ago

I often wonder whether neuroscience on LLMs is harder or humans?

link