Hacker News new | ask | show | jobs
by pierrec 2811 days ago
There's no arguing the fact that regexes are a poor fit for HTML, but maybe this is the wrong time to use that ridiculous email regex as an example, since TFA features a highly readable, fully compliant email matching regex as its main example.
3 comments

It also doesn't point out that matching email addresses in general is a nightmare because the standard is one of those "we'll just allow everything everybody is doing right now" type standards that have a million different little quibbles.

No matter what language or programming style you use it's going to be ugly because it's an ugly problem.

What the language contains as a main example is PCRE pattern that matches email addresses and in comparison to it's original EBNF incarnation is highly unreadable due all the syntax noise required to graft backtracing support onto traditional regex syntax (not to mention the fact that it's performance is certainly highly suboptimal).
The performance might be better in some cases actually.

In the case of perl, where regexps performance has been optimized for significantly, the regexp actually performs better than a more normal parser.

From http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html:

> It provides the same functionality as RFC::RFC822::Address, but uses Perl regular expressions rather that the Parse::RecDescent parser. This means that the module is much faster to load as it does not need to compile the grammar on startup.

Of course, if perl were a statically compiled language, the cost of compiling the grammar could be done at compile time.

Perl5 is not too meaningful for such performance comparisons, because on one hand it's regex implementation is very optimized while on other hand performance of Perl5 on "normal" procedural code is horrible (eg. Perl5 is about an order of magnitude slower on Gabriel's Takeuchi function benchmark than CPython).
This result generalises to most interpreted languages though. PHP, python, javascript, etc all have highly optimised regex engines, and regexs can consequently be a good optimisation technique when using those languages.
Using regex to match email addresses isn't actually a good idea either.

https://blog.onyxbits.de/validating-email-addresses-with-a-r...