Hacker News new | ask | show | jobs
by qu4z-2 3980 days ago
If the inputs are

    +ve: a | aaa | aaaaaaa | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    -ve: b | 1000 | cd
a+ would by most measures be a "smaller" regular expression than a|aaa|aaaaaaa|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

No?

EDIT: Formatting.

1 comments

If we imagine aa is an accepting state, then we can imagine that a+ is a good return value. And it's easy. That doesn't mean it's a good idea since accepting aa may be a bug in our larger code.
Well, any heuristic like that is possibly going to give either false positives or false negatives when applied to a set larger than the training data. A pure white-list approach is definitely an option for determining the accepted inputs, but generally some heuristic that attempts to accept inputs "like" the examples will probably be better.