|
|
|
|
|
by ieviev
86 days ago
|
|
> lazy quantifiers for inclusion into the next release of POSIX That is surprising! We've found that in certain simpler scenarios it's possible to use complement to express lazy quantifiers, but in the general case it appears very fragile. a simple example: a.*?b can be rewritten to something like (a.*b&~(a.*b_+)) in RE# syntax, which effectively means "there is a match, but must not contain a shorter match" > 1998 TOPLAS I will read through it properly, but i can already see from page 8 that it requires a table of (DFA size x input length) which makes me very suspicious that it is more of a thought exercise than a real world solution. > It is true that you can’t stream the haystack in the general case, but to what precise extent you can is an interesting question with a known algorithmic answer[3]. Thank you for this, this is an interesting paper |
|
> That is surprising!
I mean, POSIX BREs (only!) also include (single-digit) backreferences. Surprisingly, (with such a restriction) this is actually polynomial[1,2] if impractical (h/t 'burntsushi for the reference[3]). But I still wouldn’t take POSIX as the arbiter of sanity in this case. Thus far I’m not even convinced their text actually amounts to a well-defined ordering on parse trees.
I don’t know if nongreedy quantifiers are all that interesting without match groups, though, so this isn’t a particularly burning question in my view.
[1] https://branchfree.org/2019/04/04/question-is-matching-fixed...
[2] https://github.com/travisdowns/polyregex
[3] https://news.ycombinator.com/item?id=40431198