| You're speaking to the author of Rust's regex engine. > Hell, most people don't know what are the comprehensive list of code points characters classes include because they're poorly doc'd or undocumented. I had to write some scripts to find them for certain languages: Ruby, Python, and Rust. Can you say more? All of the classes are documented here for the regex crate: https://docs.rs/regex/latest/regex/#syntax Any not listed there are from Unicode and defined by Unicode. > I advise people to never reinvent regex or Unicode parsing themselves because there are far too many security issues and edge cases that will inevitably become problems. So what should have I done instead? I generally advise people never to say "never reinvent something," because that stifles innovation and progress. > Modern GNU grep includes optional PCRE2 support. I incorrectly recalled that it skipped NFA->DFA conversion, but that maybe how something else like Go or re2 work in certain cases. Not quite sure what you're trying to say here, but see: https://news.ycombinator.com/item?id=33567129 |