Hacker News new | ask | show | jobs
by GuB-42 647 days ago
I think the use of the term "cosine" here is needlessly confusing. It is the dot product of normalized vectors. Sure, when you do the maths, it gives out a cosine, but since we are not doing geometry here, so it isn't really helpful for a beginner to know that. Especially considering that these vectors have many dimensions and anything above 3D is super confusing when you think about it geometrically.

Instead just try to think about what it is: the sum of term-by-term products of normalized vectors. A product is the soft version of a logic AND, and it makes intuitive sense that vectors A and B are similar if there are a lot of traits that are present in both A AND B (represented by the sum) relative to the total number of traits that A and B have (that's the normalization process).

Forget about angles and geometry unless you are comfortable with N-dimensional space with N>>3. Most people aren't.

8 comments

> we are not doing geometry here

we absolutely are doing geometry here, given we're talking about metrics in a vector space – and this is trigonometry you learned by the first year of high school.

Where I live, where many people live, we enter high school aged 11. We haven’t been introduced at school to geometry yet.

I suspect you’re using American terminology. When talking about school years it’s often useful to talk about the year or grade of school, like “9th grade” or “year 9” as it’s more universal.

I’m not and, unfortunately, those aren’t universal either, even within a country. The normal terminology where I grew up would be S1, which follows P7.

I would expect most people to know about trigonometric functions by age 12, yes. (I entered high school at 11 and the first topic tackled in maths classes was elementary trigonometry.)

You might like to think of vectors in their geometric interpretation but vectors are not inherently geometric - vectors are just lists of numbers, which we sometimes interpret geometrically because it helps us comprehend them. High dimensional vectors grow increasingly ungeometric as we have to wrestle with increasingly implausible numbers of orthogonal spatial dimensions in order to render them ‘geometric’.

In the end, vectors (long lists of numbers a1, a2, a3, … an) start looking more like discrete functions f(i) = ai. And you can extend the same concept all the way to continuous functions - they’re like infinite dimensional vectors. For continuous functions over a finite interval the dot product (usually called the inner product in this domain) is just the integral of the product of two functions, and the ‘magnitude’ of a function is its RMS, and that means functions have a ‘cosine similarity’ which is not remotely geometric. There isn’t any geometric sense in which there is an ‘angle between’ cos(x) and sin(x) except it turns out that they have a cosine similarity of 0 so it implies the ‘angle between’ them is 90°, which actually makes a lot of sense. But in this same sense there’s an ‘angle between’ any two functions (over an interval).

But we are not doing geometry here.

> You might like to think of vectors in their geometric interpretation but vectors are not inherently geometric - vectors are just lists of numbers

No. They can be expressed as lists of numbers in a basis if the vector space is equipped with a scalar product but the vector itself is an object that transcends the specific numbers it is expressed in.

What you’re saying here is totally wrong and I recommend you check out the Wikipedia page on vector spaces. The geometrical object “a vector” is the more fundamental thing than the list of numbers

Tuples of numbers are a special case of a vector space, which even comes with a canonical basis and inner product for free. And since the article is about word embeddings, which map words to tuples of numbers, there’s no need to mention other vector spaces in this context.
You think this comment could have been written by someone who doesn’t understand what a vector space is?

Vectors are not purely geometric objects. Geometry is a lens through which we can interpret vectors. So is linear algebra. The objects behave the same and both perspectives give us insights about them.

Insisting vectors are only geometric is like saying complex numbers are geometric because they can be thought of as points on the complex plane.

> increasingly implausible numbers of orthogonal spatial dimensions in order to render them ‘geometric’.

Implausible how? “geometric” doesn’t mean “embeds nicely in 3D space”.

What’s wrong with talking about the angle between two L^2 functions defined on an interval? Geometric reasoning still works? If you take a span of two functions, you have a plane. What’s the issue?

In this case can people just prepend "hyper-" as in hyperplane etc? Hyper-line, hyper-angle. (Speaking as someone who has heard 'hyperplane' a few times but not others)
No, that would be incorrect. A plane is 2D. If you have two functions, and take their span, you get a 2D plane. It is a regular, flat, 2D plane.

When people say “hyperplane” they are generally talking about something with more than two dimensions.

At least when the ambient vector space is more than 3-dimensional, yeah. Specifically, a hyperplane generally refers to something with codimension 1.

(So, when the ambient vector space is finite-dimensional, the dimension of a hyperplane is one less than the dimension of the ambient vector space.)

> that means functions have a ‘cosine similarity’ which is not remotely geometric.

It obeys the normal rules you learned in geometry. For example, pick three functions a,b,c. The functions form a triangle. The triangle obeys the triangle inequality—the distances satisfy d(a,b) ≤ d(a,c) + d(c,b). The angles of the triangle sum to 180°.

This sounds an awful lot like geometry to me.

Interesting, I think it’s actually far more intuitive to think of it geometrically. I’m not sure what my brain is doing in order for this mental projection to help, but this is exactly how I made dot products “click” for me. I started to think of them in multidimensional space, almost physically (though in a very limited sense since my brain came from a monkey and generally fires on a couple cylinders).
I expect it’s like how learning to play by ear is more intuitive than sheet music. That’s great if you’re an amateur. If you’re dealing with tensors or somesuch trying to design a fusion reactor that’s probably a crutch.
This is a very odd statement and depicts the different ways human brain works. As a musician, I find playing (or thinking music in terms of) sheet music so much more intuitive than play by ear. It feels like the very reason people notate, write music is because anything written down is easier to think/play than anything listened.
I have a feeling there's some overlap here.

I can intuit a lot of things about music and even visualize some of it, but eventually I hit limitations. What I learn through these intuitions still applies as my ability to mentally visualize or model the music begins to fail, though.

It's similar with vectors. Once you have the orchestral equivalent of vectors, there's no way I'm visualizing it and doing mental geometry. However, what I learned and the modelling I developed from the "casio keyboard playing jingles" equivalent of vectors is still useful and applicable.

I guess this is the point where playing by ear or mentally modelling things fails, and notation is far more helpful. Yet if a lot of us approach these complex works from the notation angle first, we might feel pretty lost and uncertain about what we're doing with it and why.

I can tell I'm not articulating this well, but I like the musical analogy and wanted to get that out.

I can sing along to songs I never liked that haven’t been on the radio for twenty years.

So I tend to sympathize with the by ear folks.

One cannot intuitively think about higher than 3 dimensions. Even for most their intuition is often wrong in 3D space. It's quite accurate for 1D and 2D.

Richard Hamming has a whole section lecture to make everyone realize precisely this [1]. This was an eye opener to me.

https://www.youtube.com/watch?v=uU_Q2a0S0zI

Ehh… you can intuitively think about it. Intuition is something you develop with time as you gain familiarity with a subject. You just can’t bring all of your intuitions about 3D space into higher-dimensional spaces.
I gather you didn't check the lecture out. Yeah, this is hacker news.
I have seen the lecture before. Or, parts of it.

I took many classes in school where we worked with higher dimensional spaces. You wouldn’t send a physics major a lecture on physics, say it was “eye opening”, and expect them to feel the same way about it. It is stuff they have already seen before. Maybe their eyes are already open.

To be honest it’s kind of rude.

> Forget about angles and geometry unless you are comfortable with N-dimensional space with N>>3. Most people aren't.

The whole point of measuring similarity this way is that any two vectors exist in a two-dimensional space, which is where you measure the angle between them. Why would you need to be comfortable with high-dimensional spaces?

By `two vectors exist in a two-dimensional space` are you talking about how two (linearly independent) n dimensional vectors will span a 2d space?
No, I'm talking about the fact that the space spanned by two vectors is sufficient to contain those vectors. All of the analysis you could ever theoretically want to do on them can be done within that space. If you only have two vectors, you never need to consider a space with higher dimensionality than 2. Each vector is a dimension of the space, and that's it.
That is the same thing as what is being said in the comment you are replying "No" to.
No, these are not at all the same claim:

(A) Look at this space. Every point within it can be reached by combining these two vectors.

(B) Look at this space. No point outside it can be reached by combining these two vectors.

Saying that two vectors span a space is claim (A). Saying that the space they span contains them is... much weaker than claim (B), but it's related to claim (B) and not to claim (A).

for one reason, if you're just thinking about it as fancy 2d, you will miss a lot of phenomena that occur in higher dimensional spaces. for example, almost all vectors are almost completely orthogonal which isn't true at all in low dimensional spaces
Phrased like that it sounds like a qualitative difference between "low" and "high" dimensional spaces. But isn't it simply a consequence of the fact that the more dimensions you have, the less likely that randomly distributed, sparse non-zeros will end up in the same positions?

I.e. simply a quantitive difference.

Any extreme quantitative difference is going to be a qualitative difference.
> but since we are not doing geometry here, so it isn't really helpful for a beginner to know that.

This article isn't talking geometry, it's trigonometry. And half the article is visual anyway.

I bet it's whether your primary background is programming or mathematics. From the latter, the cosine is very natural (scalar projection etc.) and it's lots of steps to get to your thing. I'd say this was intuitive for us post high-school because of that pedagogical background.
Hmmm.. I heard in a conference that most well understood engineering principles or theories have a neat geometric interpretation. Personally I find a theory with geometric interpretation far easier to grasp. On the other hand, the higher dimensions geometry confuses me a lot: most random sparse vectors are orthogonal to each other, and most volumes of a sphere in that dimension are concentrated in a place.
It’s quite interesting that we end up using cosine similarity. Most networks are trained with a softmax layer at the end (e.g. next word prediction). Given the close relation between softmax and logistic regression, it might make more sense to use σ(u.v) as the similarity function.
I agree on the first point. But I find the dot product much more geometrical than the cosine. So in my mind your argument is in favour of geometry!