Hacker News new | ask | show | jobs
by eru 4687 days ago
An alternative solution is to provide a list of matches and non-matches, and look for the shortest or simplest regex that separates the two.

(Finding the actual minimal regex, instead of just a reasonable guess, might be a computationally tough problem. I guess it's in coNP and might be in NP, too. An algorithm in P would be nice to find.)

Update: Finding any separating regex would be in P. A separating finite automaton is easy to find, and then you just convert that into a regex. Now, how do you find the minimal regex?

2 comments

I really want one of these please message me!!! (tom dot larkworthy at gmail)

preferably python but a link to theory is useful too.

I wrote a brute-force minimal-regex finder once; wish I could remember where I put it.
For your entertainment, https://github.com/matthiasgoergens/Div7 has a regex finder for divisibility. E.g. https://raw.github.com/matthiasgoergens/Div7/master/regex7 checks decimal numbers for divisibility by 7.
Very cute. :-)

I found my code and it wasn't exactly it (it just enumerated all regexes in order of size), but here's a crude solution now: https://github.com/darius/sketchbook/blob/master/regex/find_...

(In defense of my memory, I had written superoptimizers for other things.)