| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wballard 2988 days ago

Besides reduction to search — solr / elastic / Lucerne / xapian, which is the most common approach I have used commercially, my actual favorite is precomputation.

At the moment, keras embedding model, multiprocessing, annoy, and emitting csv (object id, other object id, score) as a batch process and loading it in my database. Queryti recommend. This trades a prebuilt for near instant runtime and — near Nothing net new to break.

I’m working at commercial — 2-5 million item — scale, not ‘internet scale’ billions of items.

Hope that helps.