Hacker News new | ask | show | jobs
by visarga 982 days ago
The similarity operation is just a dot product, practically a sum over position wise products of two vectors. A for loop with an addition and a multiply inside. That's all you need after embedding. You can use np.dot() to get exact similarity scores and better retrieval with very fast times for under 100K vectors.

You only want the approximate nearest neighbour method for millions of vectors and above. Even that is easy to do with off the shelve libraries for local index. It only gets complicated when you want fast insertion and distributed access.

1 comments

Ok got it now, thanks!