Hacker News new | ask | show | jobs
by kazinator 1758 days ago
The angle and cosine start to lose their geometric intuition when we go beyond 3D.

The concept of correlation has no issue with additional components. The concept of similarity of two 17-element vectors is clear. In fact correlation intuitively scales to "infinite component vectors": the dot product becomes multiplying two functions together and then taking an integral.

The Fourier transform of a periodic signal is based in the concept of how similar the signal is to a certain basis of sine/cosine quadratures spaced along a frequency spectrum. This is like a projection of a vector into a space; only the vector has an infinite number of components since it is an interval of a smooth function.

4 comments

> The angle and cosine start to lose their geometric intuition when we go beyond 3D

... they do?

"Geometry" in general loses intuition beyond 3D, but apart from that, angles between two vectors are probably the one thing that still remains intuitive in higher dimensions (since the two vectors can always be reduced to their common plane).

Even angles behave very counter-intuitively in high dimensions. E.g. in high dimensional spaces uniformaly randomly chosen vectors always have the same inner product. Why? Sum x_i y_i is a sum of iid random variables, so the variance goes to zero by the central limit theorem.
I would say that this is intuitive. For any direction you pick, there are (n-1) orthogonal directions in nD space. It's only natural that the expected inner product drops to zero.
The variance goes to 0 only if you normalize, which is to say that two random high-dimensional vectors are very likely to be close to orthogonal (under mild assumptions on their distribution).

I agree that that's one of those important but initially unintuitive facts about high dimensions. Just like almost all of the volume of a reasonably round convey body is near its surface. But it also doesn't really contradict the GP comment.

> Just like almost all of the volume of a reasonably round convey body is near its surface.

I’d say that’s pretty intuitive for anyone who can see a pattern in surface area to volume ratios.

1D ball: 2 / (2 * r) = 1/r

2D ball: (2 * pi * r) / (pi * r^2) = 2/r

3D ball: (4 * pi * r^2) / (4/3 * pi * r^3) = 3/r

nD ball: ... = n/r

Most people don't find arguing from formulas intuitive unless the formulas themselves are intuitive. If you truly believe they are, I'd be curious to know why.
There is an intuitive version of this. Volume in n dimensions is C*r^n (C is some constant) and surface is the first derivative, leading to a ratio of n/r (the C constant cancels out). Hmm... Maybe not that intuitive
This idea generalises to the concept of https://en.wikipedia.org/wiki/Inner_product_space and a the equivalent of a change-of-basis.
Word2vec with embedding size 300 and more do refute your claim. I successfully trained word2vec model with above embedding sizes and used inner product similarity to create word clusters as it is out of the box there. Then I made a clusutering language model and got significantly lower perplexity compared to word-based language model.
Not really, for example, in physics, lines in 4D are just as meaningful as they are in 3D, more even (they are called geodesics). So are the angles between them. The real problem is that we just don't have good intuitions of higher dimensions in general.
I mean, I get that if I have, say:

  [0 1 1 1 0 1 0 0]
  [1 0 0 0 1 0 1 1]
that these are perpendicular to each other, which I will easily call ninety degrees, and that two such collinear vectors are at zero degrees.

But I somehow wouldn't go from that intuition into specific cosines. Like "Oh, look, if I divide out the lengths from the dot product, I'm getting 0.5! Why that's the cosine of 60 degrees!"