Hacker News new | ask | show | jobs
by bubersson 2445 days ago
To make this type of search higher precision&recall you would have to focus especially on the indexing part (e.g. improve NLU of concepts in the pages), right? The training of such ML models could be federated across the nodes in a private way.
1 comments

Indexing is important for sure. The problem is to preserve privacy and not falling back to heavy weight general purpose Multi-party computation we have to give up a bit on the precision and recall of modern search engines. Minhash, more specifically Locality Sensitive Hashing (LSH) is a good first approximation (Better then Term Freqency, worse them ML based search). Right now much of the web is unqueryable, my first goal was to allow the deep web and TOR services to be searched even at just a rudimentary level.