Hacker News new | ask | show | jobs
by inertiatic 2381 days ago
Weighted based on what, do you keep IDF values of some sort?

Even then it's hard to imagine how this performs great if these are plain Word2vec vectors. Saying it's just the recall step is a bit hand wavy as these will be actually selecting the documents you will be performing additional scoring on and may very well end up excluding a multitude of great results.

In any case, once more these are very interesting to read and as a search nerd, and I can't help but wonder about all the alternatives considered.

1 comments

We do have our own custom word (piece) embeddings that we have trained on <good query, bad query>-pairs. There are a few more details about it in https://0x65.dev/blog/2019-12-06/building-a-search-engine-fr....