As an aside, I know one of the authors of the original paper personally. Quite interesting guy, and the remark about the paper being "nearly unintelligible" is fitting.
Dramatic post ;) It'd be interesting to see concrete benchmarks of the Lucene implementation, on some public dataset we could try outside of Lucene too.
Btw I didn't find the Schulz & Mihov paper that cryptic. You can check its implementation in Python [0], pretty straightforward IMO.
But I should note that in the end, we chose a simpler approach: the FastSS index. FastSS bypasses constructing / intersecting Levenshtein automata altogether, and is super fast [1].
Btw I didn't find the Schulz & Mihov paper that cryptic. You can check its implementation in Python [0], pretty straightforward IMO.
But I should note that in the end, we chose a simpler approach: the FastSS index. FastSS bypasses constructing / intersecting Levenshtein automata altogether, and is super fast [1].
[0] https://github.com/antoinewdg/pyffs
[1] Boytsov, Leonid. (2011). Indexing methods for approximate dictionary searching: Comparative analysis. http://boytsov.info/pubs/sisap2012.pdf