Hacker News new | ask | show | jobs
by tptacek 57 days ago
2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".

I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.

2 comments

Consider that you don't want to hear "statistical generation" because it reminds you of the unchangeable nature of the underlying technology and its ultimate limitations that all the money and data centers in the world will never solve. Despite how amazing and useful they are, they are not intelligent agents. Even in this very thread, someone mentioned they thought the thing was capable of feeling an emotion. Was that comment by someone who really believes that? I don't know. But many people do and people in tech who actually know what these things are have a responsibility to not mislead the public (and ourselves) about what they really are and what they can be.
I responded to your point empirically, with problems not conventionally understood to be solvable with "text generation", and your response was in effect that I must be wrong because I'm afraid you might be right. Not an especially strong debate move.

Can you refute the argument I made, or do you just want to claim LLMs are drinking all our water?

Well, I don't believe the LLM solved those problems. I believe the user did. The LLM aggregated large amounts of information statistically, then the user read that and realized there was something to it and fixed it. Those accounts don't mention the 1000 other prompts that technical user did that yielded garbage results and the user was intelligent enough to disregard those.
No, that's false, in every example I gave. But I appreciate you making clearer that I correctly ascertained your original claim, that you believe they literally are just random text generators, and that people are simply cherry picking the rare meaningful text out of them.

That's what I thought you meant by "statistical text generator", and is why I was moved to comment.

1) I never said random 2) I never said cherry picking RARE meaningful text 3) It is not false in every example you gave just because you say that it is 4) If I didn't know better, I might think you're confused about what statistical means (hint: it's not random)
No, it's false in each example because I'm either a first or secondhand party to it happening (except for the Erdos thing) and I know it's false.

You managed to include in your blanket and conclusory rebuttal "solving undergrad math problems instantaneously". That was one of my examples because (1) it pertains to the subthread, (2) I was talking about it upthread, and (3) I have direct firsthand knowledge.

As I said elsewhere: I've fed thousands of math problems through ChatGPT (starting with 4o and now with 5.5). They've all been randomized. They do not appear in textbooks. They cover all the ground from late high school trig to university calc III. I do this habitually, every time I work an "interesting" problem, to get critiques on my own work. GPT has been flawless, routinely spotting errors or missed opportunities. If I have any complaint, it's that GPT tends to be too much better than I am at any given point, using concepts from later courses to solve simpler problems.

Square that with the claim you're making.

I can do the same thing with vulnerability research (I've been a vuln researcher since 1996 and I use LLMs to find vulnerabilities). But this thread is about math, and it's even easier to show you're wrong in the context of math.

But the systems that do that impressive work are no longer just LLMs. Look at the Claude Code leak - it’s a sprawling, redundant maze relying on tools and tests to approximate useful output. The actual LLM is a small portion of the total system. It’s a useful tool, but it’s obviously not truly intelligent - it was hacked together using the near-trillions of dollars AI labs have received for this explicit purpose.
What does this matter? You can build a working coding agent for yourself extremely quickly; it's remarkably straightforward to do (more people should). But look underneath all the "sprawling tools": the LLM itself is a sprawling maze of matrices. It's all sprawling, it's all crazy, and it's insane what they're capable of doing.

Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.

Well, hang on a second - it sounds like you may actually disagree with the user who created this thread. That user claims that these systems exhibit “real intelligence”, and success on this Erdos problem is proof.

You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?

I don't have any opinion about "real intelligence" or not. I'm not a P(doom)er, I don't think we're on the bring of ascending as a species. But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are.
(The clearer way for me to have said this is that I don't care whether they're According-to-Hoyle "intelligent", and that controversy isn't what motivated me to comment).
"But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are."

Umm, why doesn't it capture it? Why can't a statistical text generator do amazing things without _actually_ being intelligent (I'm thinking agency here)? I think it's important to remind ourselves, these things do not reflect or understand what they're outputting. That is 100% evident with the continuing issues with them outputting nonsense along with their apparently insightful output. The article itself said the output was poor but the student noticed something about it that sparked an idea and he followed that lead.

I reject the premise. I read the outputs I generate carefully (too carefully, probably). They don't "continue to output nonsense". Their success rate exceeds that of humans in some places.

To clarify: the problem I have with "statistical text generator" isn't the word "statistical". It's "text generator". It's been two years now since that stopped being a reasonable way to completely encapsulate what these systems do. The models themselves are now run iteratively, with an initial human-defined prompt cascading into series of LLM-generated interim prompts and tool calls. That process is not purely, or even primarily, one of "text generation"; it's bidirectional, and involves deep implicit searches.

Just to clarify because I’m not sure I understand:

So you agree that LLMs are in fact statistical text generators but you don’t like people use that fact in arguments about the capabilities of the things?

It's like a genotype/phenotype distinction, the genotype may be statistical text generator but the phenotype is something much more.
Not parent but I think you're being rather dense. They are _obviously_ statistical text generators. There's plenty of source code out there, anyone can go and inspect it and see for themselves so disputing that is akin to disputing the details of basic arithmetic.

But it is no longer useful to bring that fact up when conversing about their capabilities. Saying "well it's a statistical text generator so ..." is approximately as useful as saying "well it's made of atoms so ...". There are probably some very niche circumstances under which statements of each of those forms is useful but by and large they are not and you can safely ignore anyone who utters them.

He does say that LLMs are just a part of the models used these days.
I think you're actually making a point but overall still disagree.

I do think LLM's are evolving towards this kind of embodied cognition type intelligence, in virtue of how well they interoperate with text. I mean, you don't need to "make the text intelligible" to the LLM, the LLM just understands all kinds of garbage you throw at it.

Now the question is: Is intelligence being able to interoperate?

In the traditional sense, no. Well, in a loose sense, yes, because people would've said that intelligence is the ability to do anything, but that's not a useful category (otherwise, traditional computer programs would be "intelligent"). But when I hear that, I think something like "The models can represent an objective reality well, it makes correct predictions more often than not, it's one of these fictional characters that gets everything and anything right". This is how it's framed in a lot of pop culture, and a lot of "rationalist" (lesswrong) style spaces.

But if LLM's can understand a ton of unstructured intent and interoperate with all of our software tools pretty damn well... I mean, I would not call that "a bunch of hacks". In some sense, this is an appeal to the embedded cognition program. Brain in a vat approach to intelligence fails.

But it clearly enables new capabilities that previously were only possible with human intelligence. In a very blatant negative form: The surveillance state is 100% now possible with AI. It doesn't take deep knowledge of Quantum Physics to implement, with a large amount of engineering effort, data pipelines and data lakes, and to have LLM's spread out throughout the system, monitoring victims.

So I'd call it intelligence, but with a qualifier to not slip between slippery slopes. It may even be valid to call the previous notion of intelligence a bad one, sure. But I think the issue you may be running into is that it feels like people are conflating all sorts of notions of intelligence.

Now, you can add an ad hoc hypothesis here: In order to interoperate, you have to reason over some kind of hidden latent space that no human was able to do before. Being able to interoperate is not orthogonal to general intelligence - it could be argued that intelligence is interoperation.

If you're arguing for embodied cognition, fine, we agree to some extent :)

The fear is that the AI clearly must be able to emulate, internally, a latent space that reflects some "objective notion of reality". If it did that, then shit, this just breaks all of the victories of empiricism, man. Tell me about a language model that can just sit in a vat, and objectively derive quantum mechanics by just thinking about it really hard, with only data from before the 1900s.

I don't think you need to be this caricature of intelligence to be intelligent, is what I'm saying, and interoperability is definitely a big aspect of intelligence.

Now this I can agree with. One thing that is extremely important to maintain with this technology is nuanced perspective. Otherwise, it will lead you astray quickly. It's also a difficult thing for us to maintain.