Hacker News new | ask | show | jobs
by jepler 1014 days ago
This is neat!

I was surprised then not surprised that the union & intersection REs it comes up with are not particularly concise. For example the two expressions "y.+" and ".+z" have a very simple intersection: "y.*z" (equality verified by the page, assuming I haven't typo'd anything). But the tool gives

    yz([^z][^z]*z|z)*|y[^z](zz*[^z]|[^z])*zz*
instead. I think there are reasons it gives the answer it does, and giving a minimal (by RE length in characters or whatever) regular expression is probably a lot harder.
1 comments

I think one of the reasons is the ".+z" gets bigger and uglier after you convert it to a deterministic automaton.
They show the DFA for it on the site, it's 3 states. There's a starting state for the first . and then two states that transition back and forth between whether z was the last character or not.

I think what's actually happening here is that they're doing the intersection on the DFAs and then producing a regex from the resulting DFA. The construction of a regex from a DFA is where things get ugly and weird.