Hacker News new | ask | show | jobs
by kelseyfrancis 4130 days ago
Yeah, anyone thinking of using this should probably instead just look for an existing implementation of something like the Damerau–Levenshtein distance.
3 comments

OP here. My code is meant to be used for client-side bursty usage (like filtering results for an autocomplete list), which demands the code to be quick and small. In this use case, it's not necessary to figure out how similar strings are, because it's completely irrelevant.

If you're building your own search engine, which you probably shouldn't, I'd look elsewhere.

> In this use case, it's not necessary to figure out how similar strings are, because it's completely irrelevant.

I don't follow, since that's pretty much the definition of "fuzzy search". Your algorithm considers the query to be similar to the searched text if the latter contains all the characters in query in the same order.

If the use case is just to allow users to type the first few letters of each word or something, that algorithm works great, so advising "anyone" to look elsewhere didn't make sense.

If users expect to be able to make typos other than leaving out characters, though, it doesn't work, and that's often what people think of when they see "fuzzy search". I just wanted to point out that you could use Levenshtein + a BK-Tree or something and get those results if desired, though, yeah, it'd definitely be more complicated, but I doubt it'd be too slow.

> If you're building your own search engine, which you probably shouldn't, I'd look elsewhere.

Advice not taken.

I don't know how computationally intensive OP's algo is, but I've found that in real world use cases Damerau–Levenshtein distance algo has slowed down a number of our jobs (and we aim to avoid it is possible).

Our jobs are real-time where the user is awaiting a response. So probably not limiting factor for jobs that can be run on your own time.

>anyone thinking of using this should probably instead just look for an existing implementation of something like the Damerau–Levenshtein distance.

"anyone" with over 400 bytes, you mean.