Hacker News new | ask | show | jobs
by FiberBundle 1228 days ago
Does anybody know how search engines apply semantic search with embeddings? To my knowledge no practical algorithms exist that find nearest neighbors in high dimensional space (such as that in which word/sentence/document vectors are embedded in), so those wouldn't give you any benefit compared to an iterative similarity search as applied here. Which obviously is totally impractical for real search engines. There are approximate nearest neighbor algorithms such as Locality-sensitive hashing, but even they seem impractical for real world usage on the scale of the indexes that search engines use. So how can Google e.g. make this work?
1 comments

There actually are practical algorithms for finding approximate nearest neighbors (ANN) at large scales. Some of them are open source like HNSW [1] and Faiss [2], and some are even offered inside managed services like Pinecone.[3]

[1] https://www.pinecone.io/learn/hnsw/

[2] https://www.pinecone.io/learn/faiss/

[3] https://www.pinecone.io/