Hacker News new | ask | show | jobs
by _jomo 4192 days ago
There is this insane email validating RegEx [0]. The page says:

> I did not write this regular expression by hand. It is generated by the Perl module by concatenating a simpler set of regular expressions that relate directly to the grammar defined in the RFC.

There's also the famous xkcd Regex Golf [1]. Peter Norvig writes:

>So that got me thinking: can I come up with an algorithm to find a short regex that matches the winners and not the losers?

And he described his steps to create a RegEx using a list of words that must be matched and those that must not be matched [2]

[0]: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

[1]: https://xkcd.com/1313/

[2]: http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313....

1 comments

If how to create these giant matchstick regexes interest you, there is a wonderful(famous?) perl script generating a regex 6,598 chars long, more optimized and faster than earlier attempt at 4,724 bytes, in Jeffrey Friedl's book, Mastering Regular Expressions, 1st edition, Oreilly, pp 312-316, Appedix B: Email Regex Program.