This line is super smart, and yet, despite I should know a lot about regex and NP-complete, my head feels dizzy as I try to make full sense of it. A sign I'm getting old or dumb, perhaps :(
Jokes apart: I'd love for you to elaborate a bit more on this. I'm pretty sure I would benefit a lot from a more expanded, "dumber" explanation.
>> That is quite literally a formal proof that "regex+backreferences" is NP-complete, since SAT is the index NP-complete problem.
> I'd love for you to elaborate a bit more on this.
I'm not the original poster, but I'll have a go.
SAT is NP-Hard. In other words, any literally any NP problem can be efficiently[0] converted to SAT, and any solution can then be efficiently[0] converted back to a solution to the original.
Example: Think of the problem of factoring integers. Someone gives you an integer to factor, with a little work you can create a SAT instance, solve that, and then read off the factorisation of the original integer. SAT is, in some real sense, at least has hard as INT.
So there is a proof that SAT is at least as hard as every NP problem. That's what we call "NP-Hard".
Now someone has shown that they can solve SAT problems by using regex_backtrack. That means that every NP problem can be converted to SAT, then converted to regex+backtrack, solved, and the solution to the original read out from the result.
Thus regex+backtrack is at least as hard as every NP problem.
Now in the case of SAT, it itself is NP. So the combination of being NP and being NP-Hard is called "NP-Complete", or NPC. So SAT is an example of a problem that's NPC.
What has not been shown (I think) is that regex+backtrack is in NP. Showing that a solution to regex+backtrack implies a solution to SAT shows that regex+backtrack is NP-Hard.
If the linked article also shows that regex+backtrack is NP, then it is therefore NPC. But we can see that regex+backtrack is in NP, because verifying an alleged match is a polynomial time operation.
Am I missing something? I read it the other way: all CNF instances can be rewritten as regexp + backreferences, meaning re + backreferences are at least as general that SAT, not at most as. Meaning, they could be higher in the polynomial hierarchy.
As always, with all these things, there's a non-zero chance that I've mis-spoken myself somewhere. I'm going to "think out loud" on this so people can follow the thought processes.
> I read it the other way:
OK ...
> all CNF instances can be rewritten as regexp + backreferences,
By CNF you are referring to instances of the SAT problem. So yes, if you have an instance of the SAT problem, it can be re-written as an instance of regex+backtrack.
> meaning re + backreferences are at least as general that SAT,
Yes, the regex+backtrack problem is at least as hard as the SAT problem.
> ... not at most as.
Where did I say that? Here's a stripped-down summary of my comment:
* SAT is NP-Hard.
* Now someone has shown that they can solve SAT problems by using regex+backtrack. (That's the linked article)
* Thus regex+backtrack is at least as hard as every NP problem.
* SAT is NP, so it's NPC
* regex+backtrack can be seen to be in NP.
* So regex-backtrack is NPC.
So rewording that:
* The linked article shows regex+backtrack >= SAT.
* Independently we observe that checking an alleged regex+backtrack solution is a polynomial task, therefore regex+backtrack is in NP.
* SAT is in NPC, therefore regex+backtrack <= SAT (because regex+backtrack is in NP).
* Thus regex+backtrack = SAT (for some definition of "=")
So, I think you must have misread something ... I think everything I've written is correct as stands.
Proving that a problem is NP-complete requires proving that it is NP-hard and in NP. Reducing SAT to regex matching with backreferences does the former. The latter requires proving that any solution can be checked in polynomial time.
The author is also incorrect in stating However the author incorrectly states that only 3SAT problems are solvable. Proof that 3SAT is NP-hard does not exclude broader SAT.
While not a formal proof, it is fairly obvious that verifying a match is in P. Just fill in the captured groups from the answer in the regex and see if it corresponds to the input text.
Every SAT problem can be converted into a 3-SAT problem so that's also not really an issue, they are both NPC
Jokes apart: I'd love for you to elaborate a bit more on this. I'm pretty sure I would benefit a lot from a more expanded, "dumber" explanation.