| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PaulHoule 2193 days ago
	That is quite literally a formal proof that "regex+backreferences" is NP-complete, since SAT is the index NP-complete problem.

3 comments

simonebrunozzi 2193 days ago

This line is super smart, and yet, despite I should know a lot about regex and NP-complete, my head feels dizzy as I try to make full sense of it. A sign I'm getting old or dumb, perhaps :(

Jokes apart: I'd love for you to elaborate a bit more on this. I'm pretty sure I would benefit a lot from a more expanded, "dumber" explanation.

link

ColinWright 2193 days ago

>> That is quite literally a formal proof that "regex+backreferences" is NP-complete, since SAT is the index NP-complete problem.

> I'd love for you to elaborate a bit more on this.

I'm not the original poster, but I'll have a go.

SAT is NP-Hard. In other words, any literally any NP problem can be efficiently[0] converted to SAT, and any solution can then be efficiently[0] converted back to a solution to the original.

Example: Think of the problem of factoring integers. Someone gives you an integer to factor, with a little work you can create a SAT instance, solve that, and then read off the factorisation of the original integer. SAT is, in some real sense, at least has hard as INT.

So there is a proof that SAT is at least as hard as every NP problem. That's what we call "NP-Hard".

Now someone has shown that they can solve SAT problems by using regex_backtrack. That means that every NP problem can be converted to SAT, then converted to regex+backtrack, solved, and the solution to the original read out from the result.

Thus regex+backtrack is at least as hard as every NP problem.

Now in the case of SAT, it itself is NP. So the combination of being NP and being NP-Hard is called "NP-Complete", or NPC. So SAT is an example of a problem that's NPC.

What has not been shown (I think) is that regex+backtrack is in NP. Showing that a solution to regex+backtrack implies a solution to SAT shows that regex+backtrack is NP-Hard.

If the linked article also shows that regex+backtrack is NP, then it is therefore NPC. But we can see that regex+backtrack is in NP, because verifying an alleged match is a polynomial time operation.

So regex-backtrack is NPC.

             +--------------------+
             |                    |
  NP-Hard -> |                    |
             |   ,------------.   |
              \ /              \ /
               X  NP-Complete   X
              / \              / \
             /   `------------'   \
      NP -> |                      |
            |                      | 
            .    +------------+    ,
             \   | Polynomial |   /
              `--+------------+--'

[0] For a technical definition of "efficient"

link

sacado2 2193 days ago

Am I missing something? I read it the other way: all CNF instances can be rewritten as regexp + backreferences, meaning re + backreferences are at least as general that SAT, not at most as. Meaning, they could be higher in the polynomial hierarchy.

link

ColinWright 2193 days ago

As always, with all these things, there's a non-zero chance that I've mis-spoken myself somewhere. I'm going to "think out loud" on this so people can follow the thought processes.

> I read it the other way:

OK ...

> all CNF instances can be rewritten as regexp + backreferences,

By CNF you are referring to instances of the SAT problem. So yes, if you have an instance of the SAT problem, it can be re-written as an instance of regex+backtrack.

> meaning re + backreferences are at least as general that SAT,

Yes, the regex+backtrack problem is at least as hard as the SAT problem.

> ... not at most as.

Where did I say that? Here's a stripped-down summary of my comment:

* SAT is NP-Hard.

* Now someone has shown that they can solve SAT problems by using regex+backtrack. (That's the linked article)

* Thus regex+backtrack is at least as hard as every NP problem.

* SAT is NP, so it's NPC

* regex+backtrack can be seen to be in NP.

* So regex-backtrack is NPC.

So rewording that:

* The linked article shows regex+backtrack >= SAT.

* Independently we observe that checking an alleged regex+backtrack solution is a polynomial task, therefore regex+backtrack is in NP.

* SAT is in NPC, therefore regex+backtrack <= SAT (because regex+backtrack is in NP).

* Thus regex+backtrack = SAT (for some definition of "=")

So, I think you must have misread something ... I think everything I've written is correct as stands.

link

sacado2 2193 days ago

> we observe that checking an alleged regex+backtrack solution is a polynomial task

That's the point I missed at first. That's good news, because I was pretty sure perl regex were accidentally Turing complete, I don't know why.

link

ColinWright 2193 days ago

Cool.

This sort of thing can be really tough to follow because it's all deeply intertwungle. Glad I got it right.

Cheers!

link

pradn 2193 days ago

I assume this result is already known in the literature?

link

klyrs 2193 days ago

This doesn't show a date but archive.org has snapshots dating back to 2001

https://perl.plover.com/NPC/

link

gbacon 2193 days ago

Proving that a problem is NP-complete requires proving that it is NP-hard and in NP. Reducing SAT to regex matching with backreferences does the former. The latter requires proving that any solution can be checked in polynomial time.

The author is also incorrect in stating However the author incorrectly states that only 3SAT problems are solvable. Proof that 3SAT is NP-hard does not exclude broader SAT.

link

dtech 2193 days ago

While not a formal proof, it is fairly obvious that verifying a match is in P. Just fill in the captured groups from the answer in the regex and see if it corresponds to the input text.

Every SAT problem can be converted into a 3-SAT problem so that's also not really an issue, they are both NPC

link