| HN Mirror

Dramatic post ;) It'd be interesting to see concrete benchmarks of the Lucene implementation, on some public dataset we could try outside of Lucene too.

Btw I didn't find the Schulz & Mihov paper that cryptic. You can check its implementation in Python [0], pretty straightforward IMO.

But I should note that in the end, we chose a simpler approach: the FastSS index. FastSS bypasses constructing / intersecting Levenshtein automata altogether, and is super fast [1].

[0] https://github.com/antoinewdg/pyffs

[1] Boytsov, Leonid. (2011). Indexing methods for approximate dictionary searching: Comparative analysis. http://boytsov.info/pubs/sisap2012.pdf