|
|
|
|
|
by meken
1276 days ago
|
|
My understanding is the transformer layer in the LLM is basically doing something akin to message passing, it’s like a mini computer. In predicting the next word, it has to understand a lot about a lot of different kinds of topics My understanding is kinda fuzzy because I haven’t coded it up myself, but this was the takeaway I got from this explanation (starts at 36:21) https://youtu.be/cdiD-9MMpb0 |
|