| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by psyklic 649 days ago
	For the LLM itself, length matters. For example, the final logits are computed as the un-normalized dot product, making them a function of both direction and magnitude. This means that if you embed then immediately un-embed (using the same embeddings for both), a different token might be obtained. In models such as GPT2, the embedding vector magnitude is loosely correlated with token frequency.