| This comment is similar to the comment I wanted to make because I also thought it was pretty nifty. Joking aside this is pretty cool. One thing about the whole embeddings/cosine similarity thing for people who are struggling with understanding it. Computers are good at doing lots of sums. Embeddings turn a problem that seems to be about something else[1] into a problem involving lots of sums by turning that something else into numbers. So when we turn some text into embeddings (numbers), what do those numbers mean? You could imagine a space with a lot of dimensions - the author is using openai embeddings so it's like a thousand dimensions or something - and every point in that space is some embedding, which is actually a numerical representation of the meaning of some text.[2] So things with similar meaning have embeddings which are close to one another in this space. How do you decide what "close" is? Well one easy way is cosine similarity. Since these are vectors imagine two arrows coming from the origin. To make things simpler imagine it in the two-dimensional plane rather than 1000 dimensions which owuld make your brain leak out of your ears. So you have two arrows going from the origin to some two points. What you want is the length of the line from the tip of one arrow to the tip of the other arrow. For people who struggle to remember their trig, this distance is given by c^2 = a^2 + b^2 - 2ab cos theta. It just so happens that if you take the dot product of two vectors and divide by the product of their norms you get the cosine of the angle between them (cos theta). That's why it's called cosine similarity even though you don't see a cosine in the formula.[3] [1] language usually in the case of LLMs, but embeddings aren't only about text. [2] this is why searching embeddings is called semantic search. [3] The term cosine distance is often used loosely for this although I believe it's actually technically not a distance because it doesn't obey certain properties that are necessary for something to be a distance. |
I'm curious if there ARE alternative methods to cosine similarity. A lot of the things I've read mention that cosine similarity is "one of the ways to compute distance..." or "a simple way...". But I've not seen any real suggestions for alternatives. Guess everyone's thinking "if it ain't broke, don't fix it" as cosine similarity works pretty darn well