Hacker News new | ask | show | jobs
by perching_aix 5 days ago
This "they just predict the next statistically most likely token" is such an handwavey and willfully misleading explanation, it's unreal, and I'm so fucking tired of seeing it so incessantly repeated. It's beyond asinine.

You know it perfectly damn well that a typical person's idea of statistics is not some insanely high cardinality stateful prediction, but a "well a coin toss is a 50:50, and a lottery win is a 1:100000000". You also know it perfectly damn well that as a result, people will just think that all the sentences chatbots ever produced to them were then just somewhere in the massive training set, letter by letter. This insinuation is often even explicitly appealed to.

And that picture is outright false. It's a statistical process, yes, so saying that it does what it does by "just doing statistics" is gonna be a generally correct description, but that's not at all inquisitive to how exactly does it do it, nor is it the zinger you think it is. If you did the aforementioned, you'd just get milquetoast nonsense, like you can see in the countless Markov-chain primers. And while the models do have a lot of the training set lossily captured, they do also absolutely generalize (that's how they can do that lossy compression), and you can quite literally find representations of those generalizations in them, and also see them activate.

It's like summarizing how any program works by just saying "well it just manipulates ones and zeroes". Not very informative, is it? Or how programs are written by just programmers sitting in a cushy office, ryhtmically pressing keys on a keyboard. Not a very fair or insightful description, which you'll know if you've done any amount of programming in your life on your own. Extends to all other white collar jobs too.

It's also not even true in the most literal sense: models can and do absolutely choose a less than maximally likely next token, that's what the various decoding parameters are for. "Maximally likely next token" further conviently skipping over how that likelihood is established in the first place, i.e. the literal point of the question, going in a cute little circle.

I'm so over this "stochastic parrot" bullshit.

1 comments

I don't even try anymore. The people who still parrot the stochastic parrot bit this late in the game will simply never understand it.
LLMs predict next token one at a time. (Stochastically.) Literally. It's what they do. That's how they literally work.

If you don't believe me, download llama.cpp and see for yourself.

P.S. I write inference backends in C++ every day. The gall of people like you who figured out how to prompt Claude and think they're hot shit now is simply unbelievable.

I help write optimized CUDA kernels for proprietary hardware. They may "literally" work this way, but that is quite besides the point.

If you don't see why then you have exactly demonstrated my point in how practitioners like you simply lack the foundational understanding in philosophy, information theory, human consciousness, human cognition, neuroscience, necessary to bridge this conceptual gap.

(Rather, it is that we know so little of how consciousness or what intelligence even is, that we cannot possibly use first principles to preclude LLMs from possessing these qualities)

You don't understand the argument, so you keep repeating first order mechanistic observations that are irrelevant. If you don't want to understand the argument, don't be surprised when people refuse to engage with you, especially when it's evident to those more knowledgeable the position you hold is the ignorant one.

So you work on inference engines, and don't see at all what'd be hilariously disingenuous and reductive about describing how LLMs operate as "just parroting the most statistically likely next token"? It is literally* what they do, yes. And only literally, with a big asterisk of "non-colloquial meaning" after the word "statistically". Like how "significant" means something pretty different, albeit related, in academic writing vs everyday speech.

It's equivalent to professing how you just make apple pies from scratch, while your first step is to always reinvent the universe.

You're further magically blind to this operational fact being weaponized as a trope for furthering anti-ai sentiment (i.e. that it's a political dogwhistle at this point), and to thus you participating in that every time you repeat it?

* Ignoring the decoding caveat I already mentioned, along with the countless ways they're steered. There isn't jack that's likely about some of the responses they produce, and intentionally so. Including the whole chat partner act.

Look at his comments here.

Safe to say there's a cognitive block and until he tries to approach this topic in good faith he'll simply never understand. Lol.

https://news.ycombinator.com/item?id=48429027

It's so beyond tiresome. It's a classic case of someone being technically correct, and abusing the gap between that, and what people actually gather from it, for sentiment manipulation (willfully or otherwise). And I have a pretty hard time believing at this point that it's the otherwise.

I really don't know what's so interesting about auto-complete or next token prediction that it captures these people's attention so much. They're so blatantly not the salient quality to these products that is of interest to the common discourse, it's just baffling.