|
|
|
|
|
by 1letterunixname
1319 days ago
|
|
GNU grep historical design discussion: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug... Modern GNU grep includes optional PCRE2 support. I incorrectly recalled that it skipped NFA->DFA conversion, but that maybe how something else like Go or re2 work in certain cases. https://git.savannah.gnu.org/cgit/grep.git/tree/src Most people in tech don't seem to grasp that there are very few compatible/identical regex formats. Hell, most people don't know what are the comprehensive list of code points characters classes include because they're poorly doc'd or undocumented. I had to write some scripts to find them for certain languages: Ruby, Python, and Rust. I advise people to never reinvent regex or Unicode parsing themselves because there are far too many security issues and edge cases that will inevitably become problems. |
|
> Hell, most people don't know what are the comprehensive list of code points characters classes include because they're poorly doc'd or undocumented. I had to write some scripts to find them for certain languages: Ruby, Python, and Rust.
Can you say more? All of the classes are documented here for the regex crate: https://docs.rs/regex/latest/regex/#syntax
Any not listed there are from Unicode and defined by Unicode.
> I advise people to never reinvent regex or Unicode parsing themselves because there are far too many security issues and edge cases that will inevitably become problems.
So what should have I done instead?
I generally advise people never to say "never reinvent something," because that stifles innovation and progress.
> Modern GNU grep includes optional PCRE2 support. I incorrectly recalled that it skipped NFA->DFA conversion, but that maybe how something else like Go or re2 work in certain cases.
Not quite sure what you're trying to say here, but see: https://news.ycombinator.com/item?id=33567129