Hacker News new | ask | show | jobs
by manx 1435 days ago
One probably wants to provide a set of matching and a set of non-matching strings. Then the software would output a regex and some edge-case matching strings and non-matching strings.

This could be built using set operations on deterministic finite automata (dfa). Every regex is equivalent to a dfa. You can now construct automata for every positive and negative example input. Then calculate the union for all positive examples and the union for all negative examples. And finally calculate the difference between the two unions. Convert the resulting automaton back to regex.

https://scanftree.com/automata/dfa-union-property

1 comments

I was thinking of something that could categorize parts of these strings into a “language”, so there is no non-matching strings. It’s hard to specify in a formal way, but by looking at these strings you may see that e.g. […] is a static syntactic element, and a number follows it, and time precedes it. This would be nice to have to browse logs (which these strings are obviously a part of) but instead of scrolling through thousands of rows, see all of the patterns that occur among them at once, and then dig down into a pattern to inspect what happened and when to improve on “health” of a conpkex system. Of course if you know all of them in advance, it’s easy to filter by each. But lots of software/apis do not document their output in such detail.