|
|
|
|
|
by lorenzhs
4022 days ago
|
|
Unless you are the mythical 100x programmer, I doubt that you wrote a full implementation of general Levenshtein automata in an hour. I read the paper that introduced them ( http://link.springer.com/article/10.1007/s10032-002-0082-8 ) and they are quite the complex beast. Not to mention that the paper is very technical and you need to keep a dozen definitions in your head. That said, there seems to be a fairly readable implementation at https://github.com/universal-automata/liblevenshtein I'm currently working on implementing fast Levenshtein queries in C++ with a friend, and we intend to implement the paper I linked in my original post. So far, our dynamic programming Levenshtein already beats Lucene++ (C++ implementation of Lucene), which is a bit embarrassing [1]. If you're interested, more advanced stuff will hit https://github.com/xhochy/libfuzzymatch when we get around to implementing it. [1] Lucene++ spends more time converting strings between UTF-8 and UTF-32 than it does computing Levenshtein distances, says the profiler. |
|
1. I didn't follow that paper. Even trying to understand that paper would have taken way more time, so after 5 minutes of trying to understand it I gave up on that approach. See this comment for what I did do: https://news.ycombinator.com/item?id=9699870 That saved maybe 20x.
2. I used Python instead of C++ or Java. This saved 5x.
3. The code was throwaway quality code. This saved 2x.
Together that's 200x, but I'm at least a 2x worse programmer than them, so that gives you the 100x ;-)