| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hokkos 3269 days ago

I've worked on the project where some XSD files defined fields with regex restrictions, also some rules over fields added other stricter regexps or negative regexps depending on some context in a format called Schematron. I had to generate XML files conforming to those XSD, so I used some tools around Z3 solver and Microsoft.Automata to generate those strings conforming to multiple regexps. It would convert the regexps to finite automaton and intersecting them, walking it from the starting state to a final one over a charset.

Links :

https://www.microsoft.com/en-us/research/publication/symboli...

https://www.microsoft.com/en-us/download/details.aspx?id=523...

It now seems to be Open Source (MIT):

https://github.com/AutomataDotNet/Automata

2 comments

eru 3269 days ago

There's also redgrep (https://github.com/google/redgrep) that supports intersection and complements of regular expressions.

I am toying the idea of writing a little game where player A thinks of a regular expression, and player B tries to guess. If B guesses right, they win. If B guesses wrong, A has to provide a false positive and a false negative (if they exist), and B gets to guess again.

Can you think of ways to automate the roles of A and/or B?

link

long 3268 days ago

In computer science academia, this kind of game is called grammar induction (of which inferring regular expressions is a special case).

A classic algorithm for inferring regular expressions was given by Angluin: https://people.eecs.berkeley.edu/~dawnsong/teaching/s10/pape...

(This isn't quite the same setup as you're thinking of but there are a ton of variations on the basic idea)

link

eru 3268 days ago

Thanks. I had figured out that grammar induction was the right word to look for a while ago. (But took me a bit to find it.) I know the paper you linked to, but yes, it's not quite the right setup.

link

long 3268 days ago

There's a conference on grammar induction called ICGI; might wanna browse through the proceedings to see if there's anything closer.

link

eru 3268 days ago

Thanks!

I'm basically interested in the equivalent of the "guessing game" for regular expressions. (See eg https://stackoverflow.com/questions/5440688/the-guess-the-nu... for a rational number solution.)

With a fixed guesser, that would encode all regular expressions / finite automata as sequences of binary digits. (But in a interestingly different way from just serializing the table for a DFA, or writing down the regular expression in ASCII characters.)

link

long 3267 days ago

So I do AI research on something pretty related to the guessing game -- I'll shoot you an email.

link

hokkos 3265 days ago

I think AutomataDotNet can do all that :

Automation of regular expression generation, it seems easy : use RE fragments and aggregate them, or walk the type hierarchy of the RE AST and generate them randomly.

B needs to guess A's RE so we need to generate examples of strings belonging to it to gives hints : this is exactly the use case of AutomataDotNet.

Also if B guess a RE that is equivalent to A's RE it seems unfair to not attribute a win, so we need to tell if 2 RE belong to the same equivalence class. AutomataDotNet does have a AreEquivalent method.

You can automate the generation of false positive and a false negative with the method Minus to creates an automaton that accepts A-B or B-A and generate an example.

link

eru 3264 days ago

Thanks, I'll have a look.

You are right about the equivalence classes: for that you want to talk about the corresponding DFA (which have a unique normal form in the shape of the minimum DFA).

I am not sure about the rest of what you are saying: in general even just minimizing regular expressions is EXP-SPACE complete, if I remember right.

Yes, generation of false negative and false positive ain't so hard---theory agrees with you. But automating the guesser is, as far as I know.

link

hokkos 3264 days ago

I was talking about generating hints to give an opportunity to the RE guesser to find the secret RE. You seem to talk about automating the RE generation based on the pattern of the hints, yeah I didn't thought about that, the other answer seem to talk about it.

link

jfries 3269 days ago

How did you solve backreferences with that approach?

link

eru 3269 days ago

You don't. Back reference leave the space of mathematical regular expressions, that all this nice theory works for.

link

hokkos 3269 days ago

I had the luck that XSD regex doesn't support backreference.

link