|
|
|
|
|
by visarga
1323 days ago
|
|
Semantic similarity more concretely means to use neural nets to embed the text, then use cosine similarity or dot product to compute the score between two entities. embed1 = neural_net(txt1) embed2 = neural_net(txt2) sim_score = np.dot(embed1, embed2) If you're making a search engine you precompute the embeds for all the items in your database. When a user performs a search you just need to embed the query and do the dot products, which are pretty fast for small indexes. Assuming you want to index millions or billions of entities doing dot products is inefficient because it scales linearly in the size of the index. There is a trick (similar to binary search) that will find the top-k most similar results in O(log(N)) time, called approximate nearest neighbour (ANN). There are a few good libraries for that. |
|
Eg i'm interested in local serverless setups (on desktop, mobile, etc) that yield quality search results in the ~instant~ time frame, but that are also complete and accurate in results. Ie i threw out investigating ANN because i wanted complete results due to smaller datasets.