|
|
|
|
|
by cubacaban
644 days ago
|
|
Rather superficial and obfuscating. The article keeps raising the question "why ignore the magnitude" and never answers it. "The important part of an embedding is its direction, not its length. If two embeddings are pointing in the same direction, then according to the model they represent the same "meaning"." This can't be quite right. Any LLM transformer model looks at the embedding of the token sequence, (without normalizing, i.e. including its magnitude) for deciding on the next token. Why would you throw away that information, equivalent to throwing away one embedding dimension? If I had to guess why cosine similarity is the standard for comparing embeddings I suspect it's simply because the score is bounded in [-1, 1], which you may find more interpretable than the unbounded score obtained by the unnormalized dot product or Euclidean distance. In my experience, choice of similarity metric doesn't affect embedding performance much, simply use the one the embedding model was trained with. |
|