|
|
|
|
|
by stakecounter
3066 days ago
|
|
> During the inference time, we first represent user input as a vector using query encoder; then iterate over all available products and compute the metric between the query vector and each of them; finally, sort the results. Depending on the stock size, the metric computation part could take a while. Fortunately, this process can be easily parallelized. An alternative is to precompute a search index over the item vectors if the dataset of items is very large and you’re OK with running an approximate search to trade a bit of recall for performance, using algorithms provided by libraries like the following. Nmslib: https://github.com/searchivarius/nmslib Faiss (Facebook): https://github.com/facebookresearch/faiss Annoy (Spotify): https://github.com/spotify/annoy |
|