|
|
|
Ask HN: Most efficient way to test if a string contains a word?
|
|
7 points
by bladeaod
6215 days ago
|
|
For an app I wrote I need to check if a String contains a word, finding the longest word it contains.
Currently I check the whole string against a dictionary, then check every substring possible with decreasing length until I find a word. Is there a more efficient way to do this? |
|
Build a Trie out of your dictionary. Then plug your string into and record depth/chars where it deadends. Repeat for you string with the first char removed. At the end you should have the longest substring.
Minor optimization would be to stop plugging things into the Trie once the remaining characters in the string are less than the longest word you have already found.
This should run in linear time (relative to your search string) because the recursion is limited by the longest word in the dictionary (depth of the trie). I have tested this out just for fun.
First I grabbed the Wikipedia page for 'line of succession to the British throne' and stripped it down to text only (352,000 chars). Then I added 'monarch' to my dictionary. Then I took the first 10k chars of the text and put it into my trie, getting back monarch as my match and timing it. Then I figured the ratio of time taken to size of string (10k). I repeated this for 20k chars, 30k chars ... 350k chars, and the ratio remains constant, indicating time increases in proportion to string length... linear time.