| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by causal 762 days ago
	So most of my understanding comes from this series, particularly the last two videos: https://www.3blue1brown.com/topics/neural-networks Essentially each token of a text occupies a point in a many-dimensional model that represents meaning, and LLMs predict the next token by modifying the last token with the context of all the tokens before it. Attention heads are basically a way of choosing which prior tokens are most relevant and adjusting the last token's point in vector-space accordingly.