|
|
|
|
|
by cproctor
643 days ago
|
|
One thing I've wondered for a while: Is there a principled reason (e.g. explainable in terms of embedding training) why a vector's magnitude can be ignored within a pretrained embedding, such that cosine similarity is a good measure of semantic distance? Or is it just a computationally-inexpensive trick that works well in practice? For example, if I have a set of words and I want to consider their relative location on an axis between two anchor words (e.g. "good" and "evil"), it makes sense to me to project all the words onto the vector from "good" to "evil." Would comparing each word's "good" and "evil" cosine similarity be equivalent, or even preferable? (I know there are questions about the interpretability of this kind of geometry.) |
|