Hacker News new | ask | show | jobs
Which vector similarity metric should I use? (imaurer.com)
2 points by imaurer 1123 days ago
2 comments

Does this seem right?

| Task | Distance Measure |

|-------------------------------|-----------------------|

| Document classification | Cosine Distance |

| Semantic search | Cosine Distance |

| Recommendation systems | Cosine Distance |

| Image recognition | Euclidean Distance (L2)|

| Speech recognition | Euclidean Distance (L2)|

| Handwriting analysis | Euclidean Distance (L2)|

| Recommendation systems | Inner Product (Dot Product)|

| Collaborative filtering | Inner Product (Dot Product)|

| Matrix factorization | Inner Product (Dot Product)|

| Image processing | L2-Squared Distance |

| Error detection and correction| Hamming Distance |

| DNA sequence comparison | Hamming Distance |

| Taxicab geometry | Manhattan Distance |

| Chessboard distance | Manhattan Distance |

Yes
Even ignoring vector magnitudes, wouldn't cosine distance as a measure of similarity only make sense if you're working with a convex set? That seems like it's far from a guarantee working in a high-dimensional space.
Yes, cosine distance works best in convex or normalized sets. Thinking about adding this caveat. Thanks for the question.