| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 3152 days ago
	Suppose you have gigabytes of text, Annoy will find matching articles faster and more precise than grepping with keywords.

1 comments

rpedela 3152 days ago

Lucene is faster and better than grep too. Annoy may be better than Lucene's "more like this" query which is for finding similar documents in an index to a given set of documents. But how would it be helpful for keyword search which is what is being asked about?

link

visarga 3152 days ago

I know, inverted index search is fast, it is the basic search engine algorithm, but there is a difference in quality of top ranked results. With word vectors you can ensure the topic of the whole document is what you want. Many documents mix topics and some keywords appear by mistake in the wrong place, for example, because scraping web text is imperfect and might capture extra text.

link