Hacker News new | ask | show | jobs
by zef_hemel 4454 days ago
Thank you for your constructive comments. I can work with this :)
2 comments

On a related note, Komodo Edit/IDE has a great feature in their fuzzy matcher.

In ST3 for example, a space is basically treated like the regex /.*/

This means that if you type something you can only the filter results on the RHS of your input do far.

Consider: Models/player.js Controllers/player.js Views/player.js

If I type "play" then I need to hit home to filter further.

In Komodo, space is treated as a logical AND, which you can use to more effectively search the above.

I've not tried Zed yet, but if it can handle this case then that's a plus point from me.

Visual Assist does the Komodo thing in its file name and symbol selectors. It's excellent. I've never used any other search mechanism that's quite so good for very quickly narrowing down large lists to a handful of items.

(emacs users may be familiar with iswitchb, which can be operated in a similar manner. Entering space-separated strings in Visual Assist is equivalent to entering those same strings in iswitchb and then pressing C-SPC after each one to perform each search on the result of the previous one. This is marginally more fiddly, but the key part is being able to narrow the list down with one substring, and then add more substrings to further refine things.)

I've added a similar Visual Assist-/Komodo-style UI for finding things to a couple of programs I've worked on, basically copying exactly what Visual Assist does. The response is always positive.

fwiw the search is called fuzzy matching

also, gl on the project

Another interesting algorithm for fuzzy searching is this one: http://www.catalysoft.com/articles/StrikeAMatch.html

It essentially compares the number of matching two letter pairs in strings.

I used it in a C# app that did a fuzzy instant search as the user typed more letters in. On a dataset of about six thousand names it performed really well, but it was a pretty simple app. I also used a javascript version on the same list to see how it performed on an X120e netbook, and again, it was instantaneous.

No idea how the performance would compare to the methods for common substring. I like the pairs matching because I can screw up letter positioning and accidentally type a letter or two that doesn't exist in the string, but still get back strongly matching results and usually find what I want. I used it because we had a huge issue at work with people brutalizing names they entered into our employee database.

Here's an implementation I wrote in javascript: https://gist.github.com/doorhammer/3016ecaac5313a87804b

I never used it in production, and it's not optimized really at all, but it shows how straight forward a typical version can be.

I took a quick look. What he described is essentially a specific case of n-grams (n=2, bigrams).

http://en.wikipedia.org/wiki/N-gram

Thanks for the information. So it looks like it's a specific application of an algorithm to vectors of bigrams? The most relevant part of the wikipedia page (I think): http://en.wikipedia.org/wiki/N-gram#n-grams_for_approximate_...

It also appears that the algorithm I linked is actually the Sørensen–Dice index. They have the exact same formula on the wiki page: http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coef...

Appreciate the heads up. Gave me much better terms to search for. Going to add them to the notes of my gist. I'm on vacation now, so I'll have to do more reading on it over the next few days

Also made a public gist for whoever is interested:

https://gist.github.com/doorhammer/9957864

And here's an explanation with some code examples: http://www.quora.com/Algorithms/How-is-the-fuzzy-search-algo...