| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kelseyfrancis 4130 days ago
	Yeah, anyone thinking of using this should probably instead just look for an existing implementation of something like the Damerau–Levenshtein distance.

3 comments

bevacqua 4130 days ago

OP here. My code is meant to be used for client-side bursty usage (like filtering results for an autocomplete list), which demands the code to be quick and small. In this use case, it's not necessary to figure out how similar strings are, because it's completely irrelevant.

If you're building your own search engine, which you probably shouldn't, I'd look elsewhere.

link

kelseyfrancis 4129 days ago

> In this use case, it's not necessary to figure out how similar strings are, because it's completely irrelevant.

I don't follow, since that's pretty much the definition of "fuzzy search". Your algorithm considers the query to be similar to the searched text if the latter contains all the characters in query in the same order.

If the use case is just to allow users to type the first few letters of each word or something, that algorithm works great, so advising "anyone" to look elsewhere didn't make sense.

If users expect to be able to make typos other than leaving out characters, though, it doesn't work, and that's often what people think of when they see "fuzzy search". I just wanted to point out that you could use Levenshtein + a BK-Tree or something and get those results if desired, though, yeah, it'd definitely be more complicated, but I doubt it'd be too slow.

link

magoon 4130 days ago

> If you're building your own search engine, which you probably shouldn't, I'd look elsewhere.

Advice not taken.

link

jgalt212 4130 days ago

I don't know how computationally intensive OP's algo is, but I've found that in real world use cases Damerau–Levenshtein distance algo has slowed down a number of our jobs (and we aim to avoid it is possible).

Our jobs are real-time where the user is awaiting a response. So probably not limiting factor for jobs that can be run on your own time.

link

logicallee 4130 days ago

>anyone thinking of using this should probably instead just look for an existing implementation of something like the Damerau–Levenshtein distance.

"anyone" with over 400 bytes, you mean.

link