|
|
|
|
|
by Drup
1203 days ago
|
|
ocaml-re[1] uses a derivative-style construction to lazily build a DFA. The general idea is to use something similar to Owens et al's DFA construction, but doing it inline with some caching, to compile lazily (and building a thomson-like automaton, for group capture). In practice, it is fairly fast, although not as optimized as the Rust crate. :)
Derivatives supports several match semantics very easily (ocaml-re does longest, shortest, greedy, non-greedy, first). It indeed doesn't handle unicode matching though (it's possible, there is a prototype implem, but nobody took the time to push it through).
Note that it's not difficult to (lazily or not) build a NFA using derivatives as well (with Antimirov's construction). [1]: https://github.com/ocaml/ocaml-re/ |
|
If anyone wants to add this Ocaml engine to the harness (or any other engine), please email me at jamslam@gmail.com and I'll give access to the repo. The only reason it isn't public yet is because I'm still working on the initial release and iterating. But it's close enough where other people could submit benchmark programs for other regex engines.