|
|
|
|
|
by burntsushi
1205 days ago
|
|
Does there exist a regex engine I can try that uses derivatives and supports large Unicode classes and purports to be usable for others? :-) It has been a long time since I read the "Regular-expression derivative re-examined" derivative paper. Mostly the only thing I remember at this point is that I came away thinking that it would be difficult to adapt in practice for large Unicode classes. But I don't remember the details. It is honestly very difficult for me to translate your comment here into an actionable implementation strategy. But that's probably just my inexperience with derivatives talking. |
|
I don't know any besides ocaml-re that Drup already linked, sorry :).
And sorry that my comment is hard to decipher. I think the core point is that the "character set" can be an abstract type from the point of view of the derivation algorithm. So it doesn't matter how they are represented, nor "how big" a character set is.
With Antimirov's derivative (which produces an NFA), there is no constraint on this type.
With Brzozowski's derivative, you need at least the ability to intersect two character sets. So the type should implement a trait with an intersection function (in Rust syntax, `trait Intersect fn intersect(self, Self) -> Self`). That's necessary for any implementation generating a DFA anyway.
And if you also want to deal with complementation, then a second method `fn negate(self) -> Self` is necessary.