| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kragen 5469 days ago

I just put a bottom-up reformulation of the dynamic-programming solution at http://canonical.org/~kragen/sw/inexorable-misc/wordseg.c, in the function "segment". The stages are the prefixes of s: the substrings s[0:0], s[0:1], s[0:2],... s[0:n], where n is the length of s. The state of some stage s[0:i] is a finite map from all of its segmentable prefixes s[0:j] {j∈[0,i)} to segmentations of those prefixes. (Only one segmentation per prefix.) The transition function may leave the state unchanged, or it may add a pair to the map, mapping s[0:i] to some segmentation, which it can do if ∃j:∃w∈dict: (s[0:j] || word) = s[0:i], in which case the segmentation is (state[s[0:j]], word).

(In my program, the state[] table is represented simply as a vector of integers: seglen[i] is simply the length of the last word of state[s[0:i]], or 0 if s[0:i] is not present in the finite map. This is sufficient to efficiently reconstruct a segmentation.)

This is completely different from your formulation where you're thinking about i+1, i+2, etc. It's not surprising that you think that your formulation isn't dynamic programming!

Now, this is a solution by forward induction or "bottom-up dynamic programming", and I wrote it that way because it's easier to see the mapping to dynamic programming. But if you solve the problem by backward induction or "top-down dynamic programming" instead, you may be able to solve the problem a lot more efficiently, because you can avoid computing most of the table entries. And that's what happens if you just write the recurrence out directly as a recursive function and then memoize it.

1 comments

NY_Entrepreneur 5469 days ago

You may have a sufficiently tricky formulation to have a dynamic programming solution. But your states and stages are a bit strange!

Much of why we use dynamic programming is that the work at stage i needs only the work at stage i + 1 (for the backward iterations), and here in some problems we can get some huge savings in computing. Also this 'framework' does well handling uncertainty.

Yes, the usual way is to do find the solution from the end and then use it starting at the beginning. In the code I posted on this thread, I found the solution starting at the beginning and then used it by starting at the end and then printed out the words in the reverse order in which I found them.

link