Hacker News new | ask | show | jobs
by simonebrunozzi 2184 days ago
This line is super smart, and yet, despite I should know a lot about regex and NP-complete, my head feels dizzy as I try to make full sense of it. A sign I'm getting old or dumb, perhaps :(

Jokes apart: I'd love for you to elaborate a bit more on this. I'm pretty sure I would benefit a lot from a more expanded, "dumber" explanation.

1 comments

>> That is quite literally a formal proof that "regex+backreferences" is NP-complete, since SAT is the index NP-complete problem.

> I'd love for you to elaborate a bit more on this.

I'm not the original poster, but I'll have a go.

SAT is NP-Hard. In other words, any literally any NP problem can be efficiently[0] converted to SAT, and any solution can then be efficiently[0] converted back to a solution to the original.

Example: Think of the problem of factoring integers. Someone gives you an integer to factor, with a little work you can create a SAT instance, solve that, and then read off the factorisation of the original integer. SAT is, in some real sense, at least has hard as INT.

So there is a proof that SAT is at least as hard as every NP problem. That's what we call "NP-Hard".

Now someone has shown that they can solve SAT problems by using regex_backtrack. That means that every NP problem can be converted to SAT, then converted to regex+backtrack, solved, and the solution to the original read out from the result.

Thus regex+backtrack is at least as hard as every NP problem.

Now in the case of SAT, it itself is NP. So the combination of being NP and being NP-Hard is called "NP-Complete", or NPC. So SAT is an example of a problem that's NPC.

What has not been shown (I think) is that regex+backtrack is in NP. Showing that a solution to regex+backtrack implies a solution to SAT shows that regex+backtrack is NP-Hard.

If the linked article also shows that regex+backtrack is NP, then it is therefore NPC. But we can see that regex+backtrack is in NP, because verifying an alleged match is a polynomial time operation.

So regex-backtrack is NPC.

             +--------------------+
             |                    |
  NP-Hard -> |                    |
             |   ,------------.   |
              \ /              \ /
               X  NP-Complete   X
              / \              / \
             /   `------------'   \
      NP -> |                      |
            |                      | 
            .    +------------+    ,
             \   | Polynomial |   /
              `--+------------+--'
[0] For a technical definition of "efficient"
Am I missing something? I read it the other way: all CNF instances can be rewritten as regexp + backreferences, meaning re + backreferences are at least as general that SAT, not at most as. Meaning, they could be higher in the polynomial hierarchy.
As always, with all these things, there's a non-zero chance that I've mis-spoken myself somewhere. I'm going to "think out loud" on this so people can follow the thought processes.

> I read it the other way:

OK ...

> all CNF instances can be rewritten as regexp + backreferences,

By CNF you are referring to instances of the SAT problem. So yes, if you have an instance of the SAT problem, it can be re-written as an instance of regex+backtrack.

> meaning re + backreferences are at least as general that SAT,

Yes, the regex+backtrack problem is at least as hard as the SAT problem.

> ... not at most as.

Where did I say that? Here's a stripped-down summary of my comment:

* SAT is NP-Hard.

* Now someone has shown that they can solve SAT problems by using regex+backtrack. (That's the linked article)

* Thus regex+backtrack is at least as hard as every NP problem.

* SAT is NP, so it's NPC

* regex+backtrack can be seen to be in NP.

* So regex-backtrack is NPC.

So rewording that:

* The linked article shows regex+backtrack >= SAT.

* Independently we observe that checking an alleged regex+backtrack solution is a polynomial task, therefore regex+backtrack is in NP.

* SAT is in NPC, therefore regex+backtrack <= SAT (because regex+backtrack is in NP).

* Thus regex+backtrack = SAT (for some definition of "=")

So, I think you must have misread something ... I think everything I've written is correct as stands.

> we observe that checking an alleged regex+backtrack solution is a polynomial task

That's the point I missed at first. That's good news, because I was pretty sure perl regex were accidentally Turing complete, I don't know why.

Cool.

This sort of thing can be really tough to follow because it's all deeply intertwungle. Glad I got it right.

Cheers!