Hacker News new | ask | show | jobs
by LeroyRaz 223 days ago
Have you read the literature? Do you have a background in machine learning or statistics?

Yes. We know that LLMs can be trained by predicting the next token. This is a fact. You can look up the research papers, and open source training code.

I can't work it out, are you advocating a conspiracy theory that these models are trained with some elusive secret and that the researchers are lying to you?

Being trained by predicting one token at a time is also not a criticism??! It is just a factually correct description...

1 comments

> Have you read the literature? Do you have a background in machine learning or statistics?

Very much so. Decades.

> Being trained by predicting one token at a time is also not a criticism??! It is just a factually correct description...

Of course that's the case. The objection I've had from the very first post in this thread is that using this trivially obvious fact as evidence that LLMs are boring/uninteresting/not AI/whatever is missing the forest for the trees.

"We understand [the I/Os and components of] LLMs, and what they are is nothing special" is the topic at hand. This is reductionist naivete. There is a gulf of complexity, in the formal mathematical sense and reductionism's arch-enemy, that is being handwaved away.

People responding to that with "but they ARE predicting one token at a time" are either falling into the very mistake I'm talking about, or are talking about something else entirety.