|
|
|
|
|
by Radim
1836 days ago
|
|
Dramatic post ;) It'd be interesting to see concrete benchmarks of the Lucene implementation, on some public dataset we could try outside of Lucene too. Btw I didn't find the Schulz & Mihov paper that cryptic. You can check its implementation in Python [0], pretty straightforward IMO. But I should note that in the end, we chose a simpler approach: the FastSS index. FastSS bypasses constructing / intersecting Levenshtein automata altogether, and is super fast [1]. [0] https://github.com/antoinewdg/pyffs [1] Boytsov, Leonid. (2011). Indexing methods for approximate dictionary searching: Comparative analysis. http://boytsov.info/pubs/sisap2012.pdf |
|