| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zef_hemel 4502 days ago
	Thank you for your constructive comments. I can work with this :)

2 comments

glenjamin 4502 days ago

On a related note, Komodo Edit/IDE has a great feature in their fuzzy matcher.

In ST3 for example, a space is basically treated like the regex /.*/

This means that if you type something you can only the filter results on the RHS of your input do far.

Consider: Models/player.js Controllers/player.js Views/player.js

If I type "play" then I need to hit home to filter further.

In Komodo, space is treated as a logical AND, which you can use to more effectively search the above.

I've not tried Zed yet, but if it can handle this case then that's a plus point from me.

link

to3m 4502 days ago

Visual Assist does the Komodo thing in its file name and symbol selectors. It's excellent. I've never used any other search mechanism that's quite so good for very quickly narrowing down large lists to a handful of items.

(emacs users may be familiar with iswitchb, which can be operated in a similar manner. Entering space-separated strings in Visual Assist is equivalent to entering those same strings in iswitchb and then pressing C-SPC after each one to perform each search on the result of the previous one. This is marginally more fiddly, but the key part is being able to narrow the list down with one substring, and then add more substrings to further refine things.)

I've added a similar Visual Assist-/Komodo-style UI for finding things to a couple of programs I've worked on, basically copying exactly what Visual Assist does. The response is always positive.

link

mc_hammer 4502 days ago

fwiw the search is called fuzzy matching

also, gl on the project

link

doorhammer 4502 days ago

Another interesting algorithm for fuzzy searching is this one: http://www.catalysoft.com/articles/StrikeAMatch.html

It essentially compares the number of matching two letter pairs in strings.

I used it in a C# app that did a fuzzy instant search as the user typed more letters in. On a dataset of about six thousand names it performed really well, but it was a pretty simple app. I also used a javascript version on the same list to see how it performed on an X120e netbook, and again, it was instantaneous.

No idea how the performance would compare to the methods for common substring. I like the pairs matching because I can screw up letter positioning and accidentally type a letter or two that doesn't exist in the string, but still get back strongly matching results and usually find what I want. I used it because we had a huge issue at work with people brutalizing names they entered into our employee database.

Here's an implementation I wrote in javascript: https://gist.github.com/doorhammer/3016ecaac5313a87804b

I never used it in production, and it's not optimized really at all, but it shows how straight forward a typical version can be.

link

hboon 4501 days ago

I took a quick look. What he described is essentially a specific case of n-grams (n=2, bigrams).

http://en.wikipedia.org/wiki/N-gram

link

doorhammer 4501 days ago

Thanks for the information. So it looks like it's a specific application of an algorithm to vectors of bigrams? The most relevant part of the wikipedia page (I think): http://en.wikipedia.org/wiki/N-gram#n-grams_for_approximate_...

It also appears that the algorithm I linked is actually the Sørensen–Dice index. They have the exact same formula on the wiki page: http://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coef...

Appreciate the heads up. Gave me much better terms to search for. Going to add them to the notes of my gist. I'm on vacation now, so I'll have to do more reading on it over the next few days

Also made a public gist for whoever is interested:

https://gist.github.com/doorhammer/9957864

link

crashandburn4 4502 days ago

And here's an explanation with some code examples: http://www.quora.com/Algorithms/How-is-the-fuzzy-search-algo...

link