|
|
|
|
|
by nerdponx
794 days ago
|
|
You're tilting at windmills here. Where in this thread do you see anyone taking about the LLM as anything other than a next-token prediction model? Literally all of the pushback you're getting is because you're trivializing the choice of model architecture, claiming that it's all so obvious and simple and it's all the same thing in the end. Yes, of course, these models have to be well-suited to run on our computers, in this case GPUs. And sure, it's an interesting perspective that maybe they work well because they are well-suited for GPUs and not because they have some deep fundamental meaning. But you can't act like everyone who doesn't agree with your perspective is just an AI hypebeast con artist. |
|
My claim regarding architecture follows just formally: you can take any statistical model trained via gd and phrase it as a kNN. The only difference is how hard it is to produce such a model from fitting to data, rather than from rephrasing.
The idea that there's something special about architecture is, really, a hardware illusion. Any empirical function approximation algorithm, designed to find the same conditional probability structure, will in the limit t->inf, approximate the same structure (ie., the actual conditional joint distribution of the data).