Hacker News new | ask | show | jobs
by amanda99 540 days ago
Is this not kind of trivial, at least the way they've defined it? It's kind of very obviously NP complete to me. Any of these text problems where you're tying to optimize over a whole corpus is kind of not hard to see to be NP-complete, similar to longest common subsequence.
2 comments

Longest common substring is linear time.
You may have misread the parent comment. Longest common substring is not the same type of problem as longest common subsequence.
For those who were wondering what this means:

    common substring: contiguous
    common subsequence: not necessarily contiguous but in order
The post you responded to is merely giving evidence that the GP's overall claim "Any of these text problems where you're tying to optimize over a whole corpus is kind of not hard to see to be NP-complete" is sometimes not true in surprising ways -- Knuth conjectured that there was no linear time algorithm for longest common substring (but then suffix trees came along).
Longest common subsequence can be solved in at most quadratic time via dynamic programming.
Longest common subsequence for an arbitrary input is NP-hard: https://en.wikipedia.org/wiki/Longest_common_subsequence

Some narrow versions of it could be optimally solved with DP, but when the constraints are lifted, you will probably have to pay for it with either exponential memory or time.

Same applies to the Knapsack problem. You can solve some variants of it with DP, but it won't generalize.

I wouldn't call the case with two sequence "some narrow version". It is by far the most common instance of the problem and honestly the first thing that pops into my mind when I hear LCS, that's why I was very surprised to hear it was NP-hard but it does make sense for an arbitrary number of sequences.