Hacker News new | ask | show | jobs
by superbatfish 498 days ago
The author makes this assertion about LLMs rather casually:

>They don’t engage in logical reasoning.

This is still a hotly debated question, but at this point the burden of proof is on the detractors. (To put it mildly, the famous "stochastic parrot" paper has not aged well.)

The claim above is certainly not something that should be stated as fact to a naive audience (i.e. the authors' intended audience in this case). Simply asserting it as they have done -- without acknowledging that many experts disagree -- undermines the authors' credibility to those who are less naive.

6 comments

Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

Just claiming a capability does not make it true and we have 0 “proof” of original reasoning that can be proved coming from these models. Especially given the potential cheating in current SOTA benchmarks

When does a "simulation" of reasoning become so good it is no different than actual reasoning?
Love this question! Really touches on some epistemological roots and certainly a prescient question in these times. I can certainly see a theoretical where we could create this simulation in totality to our perspectives and then venture out into the universe to find that this modality of intelligence would be limited in its understanding of completely new empirical experiences/phenomenon that are outside our current natural definitions/descriptions. To add to this question: might we be similarly limited in our ability to perceive these alien phenomena? I would love to read a short story or treatise on this idea!
>Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

???

https://the-decoder.com/language-models-use-a-probabilistic-...

Yes people are claiming different things yet no definitive proof has been offered given the varying findings. I can cite another 3 papers which agree with my point and you can probably cite just as many if not more supporting yours. I’m arguing against people depicting what is not a forgone conclusion as such. It seems like in people’s rush to confirm their own preconceived notions people forget that, although a theory may be convincing, it may not be true. Evidence in this very thread of a well-known SOTA LLM not being able to tell which is greater between two numbers indicates to me that what is being called “reasoning” is not what humans do. We can make as many excuses we want per the tokenizer or whatever but then forgive me for not buying the super or even general “intelligence” of this software. I still like these tools though, even if I have to constantly vet everything they say as they often tend to just outright lie, or perhaps more accurately: repeat lies in their training data even if you can elicit a factual response on the same topic.
What would definitive proof look like? Can you definitively prove that your brain is capable of reasoning and not a convincing simulation of it?
I can’t and that’s pretty cool to think about! Of course if we’re going that far down the chain of assumption we’re not quite ready to talk about LLMs imo (then again maybe it would be the perfect place to talk about them as contrast/comparison; certainly exciting ideas in that light).

From my own perspective: if we’re gonna say these things reason and we’re using the definition of reasoning we apply to humans, then being able to reason through the trivial cases they fail to today would be a start. To the proponents of “they reason sometimes but not others” my question is why? What reason does it have to not reason and why if it is reasoning it still fails on trivial things that are variations of its own training data? I would also expect that these models would use reasoning to find new things like humans do but without humans essentially guiding the model to the correct awnser or the model just brute-forcing a problem-space with a set of rules/heuristics. Not exhaustive but a good start I think. These models have trouble currently even doing the advertised things like “book a trip for me” once a UI update happens so I think it’s a great indication we don’t quite have the intelligence/reasoning aspect worked out.

Another question I have: would a form of authentic reasoning in a model give rise to a model having an aesthetic? Could this be some sort of indicator of having created a “model of the world”? Does the model of the world perhaps imply a value judgement about it given that if one was super intelligent wouldn’t one of the first things realized be the limitations of its own understanding even given the restrictions of time and space and not ever potentially being able to observe the universe in its entirety? Perhaps a perfect super intelligence would just evaporate/transcend like in the Culture series. What a time to be alive!

IMHO, any argument against LLM intelligence should be validated by first applying them to humans.

And then you'd realize that a lot of naïve arguments against LLMs would imply that a significant portion of homo-sapiens can't reason, are unable to really think, and are no more than stochastic parrots.

It's actually a rather dangerous line of reasoning.

It’s stupid. You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem.

LLMs CAN reason. Whether it can’t reason is not provable. To prove that you have to give the LLM every possible prompt that it has no data for and effectively show it never reasons and gets it wrong all the time. Not only is the proof impossible but it’s already been falsified as we have demonstrable examples of LLMs reasoning.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for that prompt to exist in the data. Every one of those examples falsifies the claim that LLMs can’t reason.

Saying LLMs can’t reason is an overarching claim similar to the claim that humans and LLMs always reason. Humans and LLMs don’t always reason. But they can reason.

Saying something again does not provide proof of its actual veracity. Writing it in caps does not make it true despite the increased emphasis. I default to skepticism in the face of unproven assertions: if one can’t prove that they reason then we must accept the possibility that they do not. There are myriad examples of these models failing to “reason” about something that would trivial for a child or any other human (some are even given as examples in this posts other comments). Given this and the lack of concrete proof I currently tend to agree with the Apple researchers conclusion.
Here was my test at ChatGPT 3.5.[0] I made up a novel game, and it figured it out. The test is simple, but it made me doubt absolute arguments that LLMs are not able to reason, in some way.

There is a question at the end of that comment, would love to hear other options.

[0] https://news.ycombinator.com/item?id=35442147

How does this prove reasoning? The thread you point to has several question in it that remain unanswered that ask the same question? How is this not entirely derivative too — there’s a huge number of these kind of 3-box “games” (although I don’t see this as a game really) so something very similar to this is probably in the training data a lot. Writing code to factor a number is definitely very common. Variation of this are also very common interview questions for interns (at least when I was interviewing)
Here is GPT 4o in "reasoning" mode answering the question "Are LLMs capable of reasoning?"

> It depends on how you define "reasoning."

> LLMs like me can perform certain types of reasoning, such as:

> Pattern recognition & analogy – Recognizing relationships between concepts and applying similar patterns in new contexts.

> Deductive reasoning – Applying rules to specific cases (e.g., "If all humans are mortal and Socrates is a human, then Socrates is mortal").

> Inductive reasoning – Generalizing from specific examples (though often with statistical biases).

> Common-sense reasoning (to an extent) – Inferring likely outcomes based on broad training data.

> However, there are limitations:

> No true understanding – LLMs don’t "know" things like humans do; they predict based on probabilities . > Struggles with multi-step logic – Complex reasoning that requires deep memory or long-term planning is difficult.

> Lack of real-world interaction – Without direct experience, reasoning is limited to text-based patterns.

> Confidence: 90%

> (Strong evidence suggests LLMs can perform certain types of reasoning, but they are not yet comparable to human-level reasoning.)

Would you agree with that analysis? If so, then LLMs are indeed capable of reasoning, in some ways.

I was unable to find my exact "game" in google's index.

Therefore, how does my example not qualify as this, at least:

> Analogical reasoning involves the comparison of two systems in relation to their similarity. It starts from information about one system and infers information about another system based on the resemblance between the two systems.

https://en.wikipedia.org/wiki/Logical_reasoning#Analogical

My thread has been voted down and it’s getting stale. The few remaining people are biased towards there point of view and are unlikely to entertain anything that will trigger a change in their established world view.

Most people will use this excuse to avoid responding to or even looking at your link here. It is compelling evidence.

I’d settle for these things being able to do value comparison consistently well, play a game of tic tac toe more than once correctly or use a UI after an update and not fail horrendously to move the needle a little bit for me. People claiming these things selectively reason while also not being able to explain why seems a lot like magical thinking to me rather than entertaining the possibility you might be projecting onto something that is really damn-well engineered to make your anthropomorphize it.
I can prove LLMs can reason. You cannot prove LLMs can't reason. This is easily demonstrable. LLMs failing to reason is not proof LLMs can't reason, it's just proof that an LLM didn't reason for that prompt.

All I have to do is show you one prompt with a correct answer that cannot be arrived at with pattern matching and the prompt can only be arrived at through reasoning. One. You have to demonstrate this for EVERY prompt if you want to prove LLMs can't reason.

No I can “prove” it — look at any number of cases where LLMs can’t even do basic value comparisons despite being claimed as super intelligent. You can try and say well that’s a limitation of the technology and then I would reply — yes and that’s why I would say it’s not reasoning according the original human definition. Also you have yet to produce any evidence of reasoning and claiming you can over and over again doesn’t add to your arguments substance. I would be interested in your proof that some answer can’t be pattern matched too — at this point I wonder if we could create an non conscious “intelligence” that if large enough would be mostly able to describe anything known to us along some line of probability we couldn’t compute with our brain architecture and it could be close to 99.99999% right. Even if we had this theoretical probability-based super intelligence it still wouldn’t be “reasoning” but could be more “intelligent” than us.

I’m also not entirely convinced we can’t arrive at a reasoning system via probability only (a really cool thought experiment) but these systems do not meet the consistency/intelligence bar for me to believe this currently.

LLMs can reason they just don’t always reason.

That’s the claim everyone makes. That is a human definition if it reasoned one time correctly. That is the colloquial definition.

Someone who has brain damage can reason correctly on certain subjects and incorrectly on other subjects. This is an immensely reasonable definition. I’m not being pedantic or out of line here when I say LLMs can reason while using this definition.

Nobody is making the claim that LLMs reason like humans or are human or reason perfectly every time. Again the claim is: LLMs are capable of reasoning.

Answering novel prompts isn't proof of reasoning, only pattern matching. A calculator can answer prompts it's never seen before too. If anything, I would come down on the reasoning side, at least for recent CoT models-but it's not a trivial question at all.
This is a fun thought experiment and made me reminisce on my Epistemology classes — something I think the current AI conversation would benefit greatly from. I’m super excited about what we’ve created here — less from the practical standpoint and more from a philosophical one where we get to interact with another form of distilled knowledge. It’s really too bad so much is breathless hype and grift because the philosophy student in me just wants to bask in thinking about this different form/medium/distillation of knowledge we now get to interact with. Comments like these help to reinvigorate that love though so thank you!
Are there any good Epistemology resources online? Seems like we could all benefit from this these days.
I actually just sat down to crack open MITs Theory of Knowledge and it seems promising and free: https://ocw.mit.edu/courses/24-211-theory-of-knowledge-sprin...

This also looks promising:

https://hiw.kuleuven.be/en/study/prospective/OOCP/introducti...

If you wanted something a bit different Wittgenstein’s Tractatus has always made my head spin with possibilities:

https://people.umass.edu/klement/tlp/tlp-hyperlinked.html

Then I'll come up with a prompt such that the answer can only be arrived at via reasoning. I only have to demonstrate this once to prove LLMs CAN reason.
I don’t think this is the watertight case you think it is, furthermore good luck proving with closed models that your question that’s never been asked in any form or derivation (supposedly) is not in the training data.
It’s water tight if the claim is only LLMs CAN reason.

No one is making the claim that LLMs reason like humans or always reason correctly. Ask anyone who makes a claim similar to mine. We are all ONLY making the claim that LLMs can reason correctly. That is a small claim.

The counterclaim is LLMs can’t reason and that is a vastly expansive claim that is ludicrously unprovable.

> Then I'll come up with a prompt such that the answer can only be arrived at via reasoning.

Dude, if you can formulate a question and prove an answer absolutely requires "reasoning" (defined how?) then you should drop everything and publish a paper on it immediately.

You'll have plenty of time to use your discovery to poke at LLMs after you secure your worldwide fame and recognition.

Go ahead then.
This is the count donut problem. Given a grid of 1s and 0s where 1 represents land and 0 represents water find the amount of donuts. A donut is an island with at least one hole in it. Two grid cells that are diagonal or adjacent form a barrier that water cannot cross. Count the amount of donuts in the grid.

This is a unique problem I came up with. It’s a variation on counting islands. There are actually two correct answers that are straightforward. Other answers may exist but are generally not straightforward and often wrong. One answer is mathematical the other is a leetcode style solution.

Try to solve this yourself before using ai to get a feel for how hard it is. The solution should be extremely straightforward. It’s also fun to think about. When you try to think of a solution you will invariably come up with a bunch of possible solutions that are wrong which is a strong indicator of how large the range of possible answers are. Few answers are correct but many look correct.

I give this test to candidates and I never expect the candidate to solve it because it’s one of the few algorithms that requires actual reasoning and actual creativity as I came up with it so no variation of it really exists anywhere else. You can’t pattern match for it. Out of like 50 candidates you probably get one person able to solve it in less than an hour.

It’s unlikely most people on hn will be able to solve it. If you do solve it don’t post the answer as it will become training data for the next iteration of the LLM.

I gave the prompt to o3. It solved. It generated code as well which I was too lazy to verify but it solved it correctly in the description of the algorithm involved.

There is also a 3D version of this problem where the grid is 3D. It changes the entire problem if a donut is in 3D space. It is harder and I have only found one possible solution for it. I have not tried it on an LLM.

LLMs CAN read minds. Whether it can’t read minds is not provable.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for it to have known what number you were thinking of. Every one of those examples falsifies the claim that LLMs can’t read minds.

ok prove it. I'm thinking of a number right now between 1-10,000. Show me the number the LLM guesses. You can definitively prove this statement for me.

It's a probability problem really. The range of a prompt has billions of possibilities. If it arrived at a correct answer within that range then the probability it got there without reasoning is miniscule.

Same with this mind reading thing. Prove it.

Doesn't really seem fair that any one prompt proves your conclusion but it has to guess your exact number to prove my conclusion. Gemini guessed mine on the very first try (7) even though the range of numbers is infinite. Billions is small potatoes compared to what I've proven.
I’ll pick a prompt such that the range is vast so that if it gets the answer right the probability is so small that it must have arrived there by reasoning.
You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem

They scan a hyperdimensional problem space whose facetness and capacity a single human is unable to comprehend. But there potentially exist a slice that corresponds to a problem that is novel to a human. LLMs are completely alien to us both in capabilities and technicalities, so talking about whether they can reason makes as much sense as if you replaced “LLMs” with “rainforests” or “antarctica”.

Reasoning is an abstract term. It doesn’t need to be similar to human reasoning. It just needs to be able to arrive at the answer through a process.

Clearly we used the term reasoning for many varied techniques. The term doesn’t narrow to specifically one form of “human” like reasoning only.

Oh, that is true. "It" doesn't have to do human reasoning, at all.

But we have to at least define "reasoning" for the given manifestation of "it". Otherwise it's just birdspeak. Because reasoning is "the action of thinking about something in a logical, sensible way", which has to happen somewhere if not finger-pointable, then at least somehow scannable or otherwise introspectable. Otherwise it's yet another omnidude in the sky who made it all so that you cannot see him, but there will be hints if you believe.

Anyway, we have to talk something specific, not handwavy. Even if you prove that they CAN reason for some definition of it, both the proof and the definition must have some predictive/scientific power, otherwise they are as useless as nil thought about it.

For example, if you prove that the reasoning is somehow embedded as a spatial in-network set of dimensions rather than in-time, wouldn't that be literally equivalent to "it just knows the patterns"? What would that term substitution actually achieve?

Well no. If you create a machine that produces output indistinguishable from the output of things we "know" can "reason" aka "humans". Then I would call that reasoning.

If the output has a low probability of occuring by random chance then it must be reason.

>For example, if you prove that the reasoning is somehow embedded as a spatial in-network set of dimensions rather than in-time, wouldn't that be literally equivalent to "it just knows the patterns"? What would that term substitution actually achieve?

I mean, this is a method many humans use to reason themselves.

> But they can reason

This isn't demonstrated yet, I would say. A good analogy is how people have used NeRFs to generate Doom levels, but when they do, the levels don't have offscreen coherence or object permanence. There's no internal engine behind the scenes making an actual Doom level. There's just a mechanism to generate things that look like outputs of that engine. In the same way, an LLM might well just be an empty shell that's good at generating outputs based on similar-looking outputs it was trained on, rather than something that can do the work of thinking about things and producing outputs. I know that's similar to "statistical parrot", but I don't think what you're saying demonstrates anything more than that.

It can be trivially demonstrated with a unique problem that doesn’t exist in the training data and an answer that is correct and has a low probability of being arrived at without reasoning.
wow this is like:

"I made a hypothesis that works with 1 to 5. if a hypothesis holds for 10 numbers, it holds for all numbers"

No. My claim is it can reason. So my claim is along the lines of it can make claims that are within bounds such as 1 to 5 or it can make claims not within those bounds.

The opposing claim unbounded. It says LLMs can't reason period. They are making the claim that it is 100% for all possible prompts.

No one is making the claim LLMs reason all the time and always. They don't. The claim is that they CAN reason.

Versus the claim that they can't which is all encompassing and ludicrous.

your claim (hypothesis): LLMs can reason

your evidence: "it works with these inputs I tried!"

...hmm seems you're not quite versed in basic mathematical proofs?

Seems you’re not well versed in basic English.

If I can reason it doesn’t mean I’m always reasoning or constantly reasoning or if I know how to do reasoning for every prompt. It just means it’s possible. How narrow or how wide that possibility is, is orthogonal to the claim itself. Please employ logic here.

Ok math guy. Imagine I said numbers can be divided. The claim is true even though there is a number that can’t be divided. Zero.

I feel it's impossible for me to trust LLMs can reason when I don't know enough about LLMs to know how much of it is LLM and how much of it is sugarcoating.

For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems. This thing must parse natural language and output natural language. This doesn't feel necessary. I think it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be.

Regardless, the problem is the natural language output. I think if you can generate natural language output, no matter what you algorithm looks like it will look convincingly "intelligent" to some people.

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does? For example, does the LLM collect facts probably related to the prompt and a second algorithm connects those facts with proper English grammar adding conjunctions between assertions where necessary?

I believe that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow? And even if it were, I'm no expert on this, so I don't know if that would be enough to claim they do engage in reasoning instead of just mapping some reasoning as a data structure.

In essence, because my only contact with LLMs has been "products," I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

> For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems.

You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.

In essence (e.g. ignoring tokenization), the LLM is doing this:

    next_word = predict_next(document_word_list, chaos_percentage)
Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".

So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.

This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.

In other words, it's less of a thinking machine and more of a dreaming machine.

> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

Language: Yes, Natural: Depends, Separate: No.

For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

This is a great explanation of a point I've been trying to make for a while, when talking to friends about LLMs, but haven't been able to put quite so succinctly. LLMs are text generators, no more, no less. That has all sorts of useful applications! But (OAI and friends) marketing departments are so eager to push the Intelligence part of AI that it's become straight-up snakeoil.. there is no intelligence to be found, and there never will be as long as we stay the course on transformers-based models (and, as far as I know, nobody has tried to go back to the drawing board yet). Actual, real AI will probably come one day, but nobody is working on it yet, and it probably won't even be called "AI" at that point because the term has been poisoned by the current trends. IMO there's no way to correct the course on the current set of AI/LLM products.

I find the current products incredibly helpful in a variety of domains: creating writing in particular, editing my written work, as an interface to web searches (Gemini, in particular, is a rockstar assistant for helping with research), etc etc. But I know perfectly well there's no intelligence behind the curtain, it's really just a text generator.

>one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

That sounds like an interesting application of the technology! So you could for example train an LLM on piano songs, and if someone played a few notes it would autocomplete with the probable next notes, for example?

>The underlying algorithm is itself reductive in a way that can create problems

I wonder if in the future we'll see some refinement of this. The only experience I have with AI is limited to trying Stable Diffusion, but SD does have many options you can try to configure like number of steps, samplers, CFG, etc. I don't know exactly what each of these settings do, and I bet most people who use it don't either, but at least the setting is there.

If hallucinations are intrinsic of LLMs perhaps the way forward isn't trying to get rid of them to create the perfect answer machine/"oracle" but just figure out a way to make use of them. It feels to me that the randomness of AI could help a lot with creative processes, brainstorming, etc., and for that purpose it needs some configurability. For example, Youtube rolled out an AI-based tool for Youtubers that generates titles/thumbnails of videos for them to make. Presumably, it's biased toward successful titles. The thumbnails feel pretty unnecessary, though, since you wouldn't want to use the obvious AI thumbnails.

I hear a lot of people say AI is a new industry with a lot of potential when they mean it will become AGI eventually, but these things make me feel like its potential isn't to become the an oracle but to become something completely different instead that nobody is thinking about because they're so focused on creating the oracle.

Thanks for the reply, by the way. Very informative. :)

it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be

The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).

that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?

The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.

I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:

This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.

A: How can I help?

B:

An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.

So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.

Thanks, that is very informative!

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

I believe part of the problem I have when discussing "AI" is that it's just not clear to me what "AI" is. There is a thing called "LLM," but when we talk about LLMs, are we talking about the concept in general or merely specific applications of the concept?

For example, in SEO often you hear the term "search engines" being used as a generic descriptor, but in practice we all know it's only about Google and nobody cares about Bing or the rest of the search engines nobody uses. Maybe they care a bit about AIs that are trying to replace traditional search engines like Perplexity, but that's about it. Similarly, if you talk about CMS's, chances are you are talking about Wordpress.

Am I right to assume that when people say "LLM" they really mean just ChatGPT/Copilot, Bard/Gemini, and now DeepSeek?

Are all these chatbots just locally run versions of ChatGPT, or they're just paying for ChatGPT as a service? It's hard to imagine everyone is just rolling their own "LLM" so I guess most jobs related to this field are merely about integrating with existing models rather than developing your own from scratch?

I had a feeling ChatGPT's "chat" would work like a text predictor as you said, but what I really wish I knew is whether you can say that about ALL LLMs. Because if that's true, then I don't think they are reasoning about anything. If, however, there was a way to make use of the LLM technology to tokenize formal logic, then that would be a different story. But if there is no attempt at this, then it's not the LLM doing the reasoning, it's humans who wrote the text that the LLM was trained on that did the reasoning, and the LLM is just parroting them without understanding what reasoning even is.

By the way, I find it interesting that "chat" is probably one of the most problematic applications the LLMs can have. Like if ChatGPT asked "what do you want me to autocomplete" instead of "how can I help you today" people would type "the mona lisa is" instead of "what is the mona lisa?" for example.

When I say LLMs, I mean literal large language models, like all of them in the general "Text-to-Text" && "Transformers" categories, loadable into text-generation-webui. Most people probably only have experience with cloud LLMs https://www.google.com/search?q=big+LLM+companies . Most cloud LLMs are based on transformers (but we don't know what they are cooking in secrecy) https://ai.stackexchange.com/questions/46288/are-there-any-n... . Copilot, Cursor and other frontends are just software that uses some LLM as the main driver, via standard API (e.g. tgwebui can emulate openai api). Connectivity is not a problem here, cause everything is really simple API-wise.

I have heard about the tokenization process before when I tried stable diffusion, but honestly I can't understand it. It sounds important but it also sounds like a very superficial layer whose only purpose is to remove ambiguity, the important work being done by the next layer in the process.

SD is special because it's actually two networks (or more, I lost track of SD tech), which are sort of synchronized into the same "latent space". So your prompt becomes a vector that basically points at the compressed representation of a picture in that space, which then gets decompressed by VAE. And enhanced/controlled by dozens of plugins in case of A1111 or Comfy, with additional specialized networks. I'm not sure how this relates to text-to-text thing, probably doesn't.

If you want to get a better understanding of this I recommend playing around in the "chat playgrounds" on some of the engines.

The Google one allows for some free use before you have to pay for tokens. (Usually you can buy $5 worth of tokens as a minimum and that will give you more than you can use up with manual requests.)

https://aistudio.google.com/prompts/new_chat

This UI allows you to alter the system prompt (which is usually hidden from the user on eg ChatGPT) and change to different models and change parameters. And then you give it the chat input similar as any other site.

You can also install a program like "LM Studio" and that will allow you to download models (through the UI) and run locally on your own machine. This gives you a similar interface to what you see in the Google AI Studio but you run it locally. And with downloaded models. (The model you download is the actual LLM which is basically very large amount of parameters you combine with the input tokens to get the next token the system outputs.)

For a more fundamental introduction to what all these systems do there are a number of Computerphile videos which are quite informative. Unfortunately I can't find a good playlist of them all but here's one of the early ones. (Robert Miles is in many of them.) https://www.youtube.com/watch?v=rURRYI66E54

I'd actually say that in contrast to debates over informal "reasoning", it's trivially true that a system which only produces outputs as logits—i.e. as probabilities—cannot engage in *logical* reasoning, which is defined as a system where outputs are discrete and guaranteed to be possible or impossible.
Proof by counterexample?

> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother

>> The surgeon is the boy's mother. [...]

- 4o-mini (I think, it's whatever you get when you use ChatGPT without logging in)

For your amusement, another take on that riddle: https://www.threepanelsoul.com/comic/stories-with-holes
Could someone list the relevant papers on parrot vs. non-parrot? I would love to read more about this.

I generally lean toward the "parrot" perspective (mostly to avoid getting called an idiot by smarter people). But every now and then, an LLM surprises me.

I've been designing a moderately complex auto-battler game for a few months, with detailed design docs and working code. Until recently, I used agents to simulate players, and the game seemed well-balanced. But when I playtested it myself, it wasn’t fun—mainly due to poor pacing.

I go back to my LLM chat and just say, "I play tested the game, but there's a big problem - do you see it?" And, the LLM writes back, "The pacing is bad - here are the top 5 things you need to change and how to change it." And, it lists a bunch of things, I change the code, and playtest it again. And, it became fun.

How did it know that pacing was the core issue, despite thousands of lines of code and dozens of design pages?

I would assume because pacing is a critical issue in most forms of temporal art that does story telling. It’s written about constantly for video games, movies and music. Connect that probability to the subject matter and it gives a great impression of a “reasoned” answer when it didn’t reason at all just connected a likelihood based off its training data.
idk this is all irrelevant due to the huge data used in training...

I mean, what you think is "something new" is most likely to be something already discussed somewhere in the internet.

also, humans (including postdocs and professors) don't use THAT much data + watts for "training" to get "intelligent reasoning"

But there are many, many things that suck about my game. When I asked it the question, I just assumed it would pick out some of the obvious things.

Anyway, your reasoning makes sense, and I'll accept it. But, my homo sapien brain is hardwired to see the 'magic'.

On the other hand, the authors make plenty of other great points -- about the fact that LLMs can produce bullshit, can be inaccurate, can be used for deception and other harms, are now a huge challenge for education.

The fact that they make many good points makes it all the more disappointing that they would taint their credibility with sloppy assertions!