Hacker News new | ask | show | jobs
by jxy 765 days ago
> We can see that in this case, where perhaps the X axis represents "more cat" and Y axis "more dog", using the euclidean distance (i.e. physical distance length), a pitbull is somehow more similar to a Siamese cat than a "dog", whereas intuitively we'd expect the opposite. The fact that a pitbull is "very dog" somehow makes it closer to a "very cat". Instead, if we take the angle distance between lines (i.e. cosine distance, or 1 minus angle), the world makes sense again.

Typically the vectors are normalized, instead of what's shown in this demonstration.

When using normalized vectors, the euclidean distance measures the distance between the two end points of the respective vectors. While the cosine distance measures the length of one vector projected onto the other.

1 comments

The issue with normalization is that you lose a degree of freedom - which when you're visualizing, effectively means losing a dimension. Normalized 2d vectors are really just 1d vectors; if you want to show a 2d relationship, now you have to use 3d vectors (so that you have 2 degrees of freedom again).