Hacker News new | ask | show | jobs
by meken 1276 days ago
My understanding is the transformer layer in the LLM is basically doing something akin to message passing, it’s like a mini computer. In predicting the next word, it has to understand a lot about a lot of different kinds of topics

My understanding is kinda fuzzy because I haven’t coded it up myself, but this was the takeaway I got from this explanation (starts at 36:21)

https://youtu.be/cdiD-9MMpb0