|
|
|
|
|
by psyklic
649 days ago
|
|
For the LLM itself, length matters. For example, the final logits are computed as the un-normalized dot product, making them a function of both direction and magnitude. This means that if you embed then immediately un-embed (using the same embeddings for both), a different token might be obtained. In models such as GPT2, the embedding vector magnitude is loosely correlated with token frequency. |
|