|
|
|
|
|
by gamegoblin
4452 days ago
|
|
You have to have some sort of heuristic that determines what a "good" regex is, since there are undoubtedly multiple regexes that describe a corpus. A simple heuristic is the smallest regex. So in your example, given the training examples: aba
abaa
aaaaba
and the counter examples: abba
ba
ab
It's clear to a human I probably want to match "a+ba+". That's clearly much smaller than ("aba" | "abaa" | "aaaaba") & !("abba" | "ba" | "ab"), so it would be a "better" regex. |
|