Hacker News new | ask | show | jobs
by stakecounter 3066 days ago
> During the inference time, we first represent user input as a vector using query encoder; then iterate over all available products and compute the metric between the query vector and each of them; finally, sort the results. Depending on the stock size, the metric computation part could take a while. Fortunately, this process can be easily parallelized.

An alternative is to precompute a search index over the item vectors if the dataset of items is very large and you’re OK with running an approximate search to trade a bit of recall for performance, using algorithms provided by libraries like the following.

Nmslib: https://github.com/searchivarius/nmslib

Faiss (Facebook): https://github.com/facebookresearch/faiss

Annoy (Spotify): https://github.com/spotify/annoy