Hacker News new | ask | show | jobs
by fishmaster 1831 days ago
There's a quite amusing blog post about implementing the automaton for Lucene: http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is...

As an aside, I know one of the authors of the original paper personally. Quite interesting guy, and the remark about the paper being "nearly unintelligible" is fitting.

1 comments

Dramatic post ;) It'd be interesting to see concrete benchmarks of the Lucene implementation, on some public dataset we could try outside of Lucene too.

Btw I didn't find the Schulz & Mihov paper that cryptic. You can check its implementation in Python [0], pretty straightforward IMO.

But I should note that in the end, we chose a simpler approach: the FastSS index. FastSS bypasses constructing / intersecting Levenshtein automata altogether, and is super fast [1].

[0] https://github.com/antoinewdg/pyffs

[1] Boytsov, Leonid. (2011). Indexing methods for approximate dictionary searching: Comparative analysis. http://boytsov.info/pubs/sisap2012.pdf