| HN Mirror

You can do all that and more: for example, to find lexical variations of a word, just compute word vectors for the corpus and then search the most similar vectors to a root word, that also contain the first letters (first 3 or 4 letters) of the root. It's almost perfect at finding not only legal variations, but also misspellings.

In general, if you want to search over millions of documents, use Annoy from Spotify. It can index millions of vectors (document vectors for this application) and find similar documents in logarithmic time, so you can search in large tables by fuzzy meaning.

https://github.com/spotify/annoy