|
|
|
|
|
by edmara
774 days ago
|
|
>Why it's maximal is not in the model at all, nor the data >It replays the data to us and we suppose the LLM must have the property that generates this data originally. So to clarify, what you're saying is that under the hood, an LLM is essentially just performing a search for similar strings in its training data and regurgitating the most commonly found one? Because that is demonstrably not what's happening. If this were 2019 and we were talking about GPT-2 it would be more understandable but SoTA LLMs can in-context learn and translate entire languages which aren't in their dataset. Also RE inference time, when you give transformers more compute for an individual token, they perform better
https://openreview.net/forum?id=ph04CRkPdC |
|