|
|
|
|
|
by corimaith
292 days ago
|
|
The definition of a language model is literally the probability distribution of the most likely next token given a preceding text. When OP says "memorizing patterns and repeating stuff", it's a strawman of a basic n-gram model, obviously with modern language it's more advanced because we techniques like vector tokenization, but at it's core it's still just probability that's limited to the corpus it was trained on. Or at it's core, if you give it question that it's never seen, what's the most likely reply you might get, and it will give you that. But dosen't mean there is a internal world-model or anything, it's ultimately wether you think language is sufficient to model reality, which I probably think not. It obviously would be very convincing, but not necessairly correct. |
|
> techniques like vector tokenization
(I assume you're talking about the input embedding.) This is really not an important part of what gives LLMs their power. The core is that you have a large scale artificial neural net. This is very different than an n-gram model and is probably capable of figuring out anything a human can figure out given sufficient scale and the right weights. We don't have that yet in practice, but it's not due to a theoretical limitation of ANNs.
> probability distribution of the most likely next token given a preceding text.
What you're talking about is an autoregressive model. That's more of an implementation detail. There are other kinds of LLMs.
I think talking about how it's just predicting the next token is misleading. It's implying it's not reasoning, not world-modeling, or is somehow limited. Reasoning is predicting, and predicting well requires world-modeling.