Hacker News new | ask | show | jobs
by microtonal 4695 days ago
It depends on what the grandparent means with 'not exactly'. If it means 'within a certain edit distance', it is very doable. You can store your dictionary in an automaton and construct a Levenshtein[1] automaton in linear time. The intersection of the dictionary and Levenshtein automaton gives all words within the given edit distance. My library (Dictomaton) that I mention in another comment implements this. There is a different implementation in recent Lucene versions, that is used for finding documents with keywords that approximately match the query.

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.6...

1 comments

The problem is that this is unlikely to give what you want, since, for example, if my inputs are "123", "2315", "12451", ..., then what I'm looking for is probably going to be something along the lines of "\d+", and not the set of words that are within a small edit distance of a subset of integers.
It depends on the application. For instance, in search engines approximate matching is very useful. E.g. if people want to search for Rotterdam, they make typos like Roterdam or are foreigner and don't know that Rotterdam has a double t.

You can rank candidates afterwards. However, it's a good solution for finding words that are closed to a mistyped word.

I have successfully used such techniques in a spellchecker and various search engines.