|
|
|
|
|
by nmrm
4452 days ago
|
|
T0 | T1 | T2 | ... would match exactly the correct thing with all positive examples, and (T0 | T1 | T2) & !(CE1 | CE2 | CE3) would match exactly the correct thing with positive and negative examples. But that's pretty stupid, because you don't generalize beyond your examples. What's your approach? <em>edit: removed random conjecture</em> |
|
A simple heuristic is the smallest regex.
So in your example, given the training examples:
and the counter examples: It's clear to a human I probably want to match "a+ba+". That's clearly much smaller than ("aba" | "abaa" | "aaaaba") & !("abba" | "ba" | "ab"), so it would be a "better" regex.