Hacker News new | ask | show | jobs
by libraryofbabel 113 days ago
I have come to think “predict the next token” is not a useful way to explain how LLMs work to people unfamiliar with LLM training and internals. It’s technically correct, but at this point saying that and not talking about things like RLVR training and mechanistic interpretability is about as useful as framing talking with a person as “engaging with a human brain generating tokens” and ignoring psychology.

At least AI-haters don’t seem to be talking about “stochastic parrots” quite so much now. Maybe they finally got the memo.

7 comments

>“predict the next token” is not a useful way

That is the exact thing to say because that is exactly what it does, despite how it does so.

It is not useful to say it if you are an AI-shill though. You bought up AI-hater, so I think I am entitled to bring up AI-shills.

My neurons are also just passing electric signals back and forward and exchanging water and salts with the rest of my body.
> just passing electric signals back and forward

Ok, feel free to call yourselves a toaster, I don't mind!

What, reductionism only works when you do it?
I didn't
I mean that's really just a comparison to how silicon circuits work though isn't it.

"Thinking rocks" vs "thinking meat sacks" isn't much of a distinction really.

Conversely if you approach conversations the same way an LLM does and just repeat what you've heard other people say a lot without actually knowing what it means then you're also likely to be compared to a feathery chatterbox.

I think talking to people unfamiliar with LLM training using words like "RLVR training and mechanistic interpretability" is about as useful as a grave robber in a crematorium.
Obviously you don’t just say those words and leave it at that. Both those things can be explained in understandable terms. And even having a superficial sense of what they are gives people a better picture of what modern LLMs are all about than tired tropes from three years ago like “they’re just trained to predict the next token in the training data, therefore…”
Must one be an "AI-hater" to use the term "stochastic parrot"? Which is probably in response to all the emergent AGI claims and pointless discussions about LLMs being conscious.
Sampling over a probability distribution is not as catchy as "stochastic parrot" but I have personally stopped telling believers that their imagined event horizon of transistor scale is not going to deliver them to their wished for automated utopia b/c one can not reason w/ people who did not reach their conclusions by reasoning.
> stochastic parrots

I prefer to use the term "spicy autocomplete" myself.

Technical concepts can be broken down into ideas anyone can understand if they're interested. Token prediction is at the core of what these tools do, and is a good starting point for more complex topics.

On the other hand, calling these tools "intelligent", capable of "reasoning" and "thought", is not only more confusing and can never be simplified, but dishonest and borderline gaslighting.

“Stochastic parrots” only stopped because AI fanboys stopped screaming “AGI” and “it will replace everyone”. Maybe they finally got the memo?