Hacker News new | ask | show | jobs
by necroforest 827 days ago
cosine similarity is (isomorphic to) "distances to nearby objects". and not all embeddings are word embeddings.
1 comments

It's isomorphic when vectors are normalized, otherwise it's angle distance, not position distance
It’s a mistake to think of vectors as coordinates of objects in space, though. You can visualize them like that, but that’s not what they are. The vectors are the objects.

A vector is just a list of n numbers. Embedded into a n dimensional space, a vector is a distance in a direction. It isn’t ’the point you get to by going that distance in that direction from the origin of that space’. You don’t need as space to have an origin for the embedding to make sense - for ‘cosine similarity’ to make sense.

Cosine similarity is just ‘how similar is the direction these vectors point in’.

The geometric intuition of ‘angle between’ actually does a disservice here when we are talking about high dimensional vectors. We’re talking about things that are much more similar to functions than spatial vectors, and while you can readily talk about the ‘normalized dot product’ of two functions it’s much less reasonable to talk about the ‘cosine similarity’ between them - it just turns out that mathematically those are equivalent.

Fair enough.

I think people skip over that the vectors are the result of the minimization of the objective.

That objective is roughly the same since word2vec. GLoVe is mathematically equivalent. LLMs are also equivalent.

For a LM, the objective function is still roughly the same. Maximizing probability of the next token conditional on previous tokens.

This means the embedding vector of a token minimizes distance to tokens that come before it often, and maximizes distance to those that don't.