Hacker News new | ask | show | jobs
by _eojb 1084 days ago
Well, `std::regex` is literally orders of magnitude slower than other common regex libraries found in JS, Python, Perl, C, etc. It allocates a ton, is poorly implemented, and can never be fixed due to ABI constraints. The entire <regex> subsystem is a mess and should have never been standardized as is.
2 comments

While that is true it can still be a good thing to use on a non-perofrmance critical path instead of adding a dependency on an external library.

Maybe it shouldn't be there and some better thing should be there, but given it exists ... as long as one is aware of alternatives using it is fine.

There's "not fast but usable enough" of course, but I would never ship std::regex code on any user facing software. Forget realtime, std::regex fails to be interactive in examples where other libraries resolve quickly.
the last time I tried it (it was years ago) std::regex was taking a measurable number of milliseconds to evaluate which is kind of a very long time even outside of performance critical paths.
There's nothing wrong with the std::regex design as it is the standard.

std::regex only sucks because the developers of gcc and clang never bothered to optimize it. (Too much work and they have other stuff to worry about.)

"can't be fixed without breaking ABI" sounds plausible for C++.

There's generally not all that much stdc++ specific optimisation stuff in clang. There might be parts of regex that are worth implementing as compiler intrinsics, that seems to be the existing pattern for making bits much faster.

The really heavy lifting you want for regex is to partially evaluate and split them. They're a separate language unto themselves and benefit from being optimised as such. There's nowhere ideal in the clang/llvm pipeline to do that though.

C++ regexes are literally just copy-pasted ECMAScript regexes. They could have just used an existing regex library, but C++ compiler developers presumably don't want to support an extra dependency.

That's the only real reason why std::regex is slow.

std::regex also depends on locale, which is reason enough to avoid it, regardless of performance.