|
|
|
|
|
by acjohnson55
648 days ago
|
|
I might have missed this, but I think the post might bury the lede that in a high dimensional space, two randomly chosen vectors are very unlikely to have high cosine similarity. Or maybe another way to put it is that the expected value of the cosine of two random vectors approaches zero as the dimensionality increases. Most similarity metrics will be very low if vectors don't even point in the same direction, so cosine similarity is a cheap way to filter out the vast majority of the data set. It's been a while since I've studied this stuff, so I might be off target. |
|