Hacker News new | ask | show | jobs
by btilly 5422 days ago
Indeed. The only detail that surprised me was that you could only do exact lookups in the dictionary. That makes it O(n*n). If you had the dictionary stored in a trie it would be O(n) on long strings. (With a maximum constant whose size depends on the length of the longest word in the dictionary.)
1 comments

You could store the dictionary in a trie in O(m) time before you start. But I don't think it's true that it makes it O(n); being able to tell that the prefix you're considering is the prefix of some word doesn't help you, because you still don't know if the suffix of the string after that word is segmentable. (Or even whether the string contains the entire word.)
It does help you. You're following a dynamic programming solution. Suppose that the maximum dictionary word is of length k. Then for each position in the string you have a maximum of k previous positions that you're tracking for, "We could start a next word here."

Once you've scanned the string, only then do you discover whether the whole string is segmentable.

Put another way, while you're processing you don't immediately know whether or not the whole string is segmentable. But having a trie can let you discard possibilities early. Discarding work early means doing less work means being more efficient.

It's true that the trie allows you to prune the search tree, but I don't think that gets you to O(N). The maximum-dictionary-word-length check does get you to O(N), though, or rather O(kN) if you consider the dictionary as part of the input.
The trie check contains the maximum-dictionary-word-length check as an obvious special case and therefore its worst case is the same order of magnitude efficiency as the other check.

You do have the cost of descending a level in the trie. But that can be made constant (with a jump table) or else the log of the size of the alphabet (which is a constant for all intents and purposes).

I suppose that depends on what you store in your trie. If every node contains a field for the length of the longest path below it, then yes. But that's not a normal thing to store in a trie, and it wasn't at all obvious to me that that was what you meant.
Not true.

If you keep on following the trie, you'll fall out of the trie at the point that there are no words that have that sequence at the start. Which will happen by the time you exceed the longest word in the dictionary, but will probably happen substantially before that.