Hacker News new | ask | show | jobs
by naijaboiler 649 days ago
Imagine 2 points in 3 dimensional space with a vector being the line from the origin to the point. So you have 2 vectors pointing going to the 2 points from the origin.

If those points are really close together, then angle between the two vector lines is very small. Loosely speaking cosine is a way to quantize how close two lines with a shared origin is. If both lines are the same, the angle between them is 0, and the cosine of 0 is 1. If two lines are 90 degrees apart, their cosine is 0. If two lines are 180 degrees apart, their cosine is -1. So cosine is a way to quantify the closeness of two lines which share to same origin

To go back with 2 points in space that we started with, we can measure how close those 2 points are by taking the cosine of the lines going from origin to the two points. If they are close, the angle between them is small. If they are the exact same point, the angle between the lines is 0. That line is called a vector

Cosine similarity measures how closes two vectors are in Euclidean space. That’s we end up using it a lot. It’s no the only way to measure closeness. There are many others

2 comments

Are all the points in question one unit distant from the origin?
I vaguely remember some paper where they didn’t even bother normalizing the vectors, because they expected zeros to be very close to zero, and anything else was considered a one.

I have no idea if this a common optimization or if it was something very niche. It was for a heuristic matrix reordering strategy, so I think they were willing to accept some mistakes.

In cosine similarity, yes.. iirc there is a recent paper arguing that this causes cosine similarity to perform poorly in high dimensional (aka ML) vector spaces.
...How recent? Bellman coined curse of dimensionality in 1957.
I think he is talking about the 2024 arxiv paper by some netflix (?) researchers that say that it's best not to normalize the dot products (so instead of cosine similarity you just have a dot product).

For most commercial embeddings (openai etc) this is not a problem as the embeddings are already normalized

yes, cosine similarity involves normalizing the points (by the L2 norm) and then dot product. In other words the points lie on the unit (hyper)sphere.
They don’t have to be. The computation of cosine distance normalizes the distance.

In the intuitive explanation I gave, the distance from the origin doesn’t matter at all, unless you are trying to force “cosine disbtance” to mean “metric distance”

There are many many ways to quantify closeness. Metric distance i.e taking a ruler and measuring the distance between the 2 points, is one way.

Measuring the angles between the 2 lines that join the points to the origin is another way to measure closeness

The squared of the measured metric distance is another way

The absolute value of the ruler distance is another way

The question you asked is only relevant if you’re trying to force cosine distance into ruler distance. You don’t need to. Cosine distance is a sufficient way by itself to measure closeness.

But yeah generally speaking, costume distance correlates better with metric distance even the points are relatively more equidistant to the origin

Good explanation, can you also explain how a sentence ends up as a point next to another point where the sentences has similar meaning. What does it mean for two sentences to be similar?