Hacker News new | ask | show | jobs
by burntsushi 1319 days ago
You're speaking to the author of Rust's regex engine.

> Hell, most people don't know what are the comprehensive list of code points characters classes include because they're poorly doc'd or undocumented. I had to write some scripts to find them for certain languages: Ruby, Python, and Rust.

Can you say more? All of the classes are documented here for the regex crate: https://docs.rs/regex/latest/regex/#syntax

Any not listed there are from Unicode and defined by Unicode.

> I advise people to never reinvent regex or Unicode parsing themselves because there are far too many security issues and edge cases that will inevitably become problems.

So what should have I done instead?

I generally advise people never to say "never reinvent something," because that stifles innovation and progress.

> Modern GNU grep includes optional PCRE2 support. I incorrectly recalled that it skipped NFA->DFA conversion, but that maybe how something else like Go or re2 work in certain cases.

Not quite sure what you're trying to say here, but see: https://news.ycombinator.com/item?id=33567129