| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Chabsff 221 days ago

> They do it by iteratively predicting the next token.

You don't know that. It's how the llm presents, not how it does things. That's what I mean by it being the interface.

There's ever only one word that comes out of your mouth at a time, but we don't conclude that humans only think one word at a time. Who's to say the machine doesn't plan out the full sentence and outputs just the next token?

I don't know either fwiw, and that's my main point. There's a lot to criticize about LLMs and, believe or not, I am a huge detractor of their use in most contexts. But this is a bad criticism of them. And it bugs me a lot because the really important problems with them are broadly ignored by this low-effort, ill-thought-out offhand dismissal.

1 comments

LeroyRaz 221 days ago

Have you read the literature? Do you have a background in machine learning or statistics?

Yes. We know that LLMs can be trained by predicting the next token. This is a fact. You can look up the research papers, and open source training code.

I can't work it out, are you advocating a conspiracy theory that these models are trained with some elusive secret and that the researchers are lying to you?

Being trained by predicting one token at a time is also not a criticism??! It is just a factually correct description...

Chabsff 220 days ago

> Have you read the literature? Do you have a background in machine learning or statistics?

Very much so. Decades.

> Being trained by predicting one token at a time is also not a criticism??! It is just a factually correct description...

Of course that's the case. The objection I've had from the very first post in this thread is that using this trivially obvious fact as evidence that LLMs are boring/uninteresting/not AI/whatever is missing the forest for the trees.

"We understand [the I/Os and components of] LLMs, and what they are is nothing special" is the topic at hand. This is reductionist naivete. There is a gulf of complexity, in the formal mathematical sense and reductionism's arch-enemy, that is being handwaved away.

People responding to that with "but they ARE predicting one token at a time" are either falling into the very mistake I'm talking about, or are talking about something else entirety.