|
|
|
|
|
by superkuh
611 days ago
|
|
> when an LLM tells us that the sun rises in the east, it's not because it truly understands astronomy or the solar system. Rather, it has seen the phrase "sun rises in the east" repeated so many times in its training data that this becomes the highest probability answer. It's less about understanding and more about pattern recognition. This is not too different from most human person's experience with basic education (https://www.lesswrong.com/posts/NMoLJuDJEms7Ku9XS/guessing-t...). Though it is obvious that humans can do more than that too. And while "voting" that may be a close-enough metaphor for the multi-layer perceptron layers, it is not what is happening in the attention layers. I suggest checking out 3blue1brown's videos on gradient descent https://www.youtube.com/watch?v=IHZwWFHWa-w and "How might LLMs store facts" https://www.youtube.com/watch?v=9-Jl0dxWQs8 versus how attention works https://youtu.be/eMlx5fFNoYc |
|