Hacker News new | ask | show | jobs
by drfritznunkie 4288 days ago
I regularly tested the "email regex du jour" at my previous job whenever these types of articles came up. IIRC, it was against 15+MM known good email addresses, and probably double that in known bads and nearly every one tested had its issues. [edit: we had something like 150,000 distinct active domains, and probably 1/2 that of distinct MXes (if you rolled up all the google-biz and microsoft hosted stuff)... if you think getting your email delivered by gmail is difficult, try a school district in Wyoming that appeared to have a 300baud connecting it to world running an ancient version of Groupware that rejected email according to the weather report as far as we could tell...]

Most people working on the code for that sign-up page (/what have you) neither have the regex-fu necessary nor the understanding of email to write the regex correctly... So you get a lot of shitty regexes (especially large corporations) that don't support apostrophes or dashes/plus signs in the local parts. And it doesn't matter how good your regex-fu and RFC comprehension abilities are, there are a lot of broken implementations out there and blocking a subscriber because of their broken system isn't a great business.

It took awhile, but eventually we switched our signup forms to do a couple of very effective things beyond a very simple address regex: 1) auto-suggest for common misspellings of our most common domains (gmal.com, yaho.com, etc.) 2) while the "please re-type your email" gave us enough user delay, we did a DNS lookup of the domain, then an MX lookup. If there was a problem with either, we passed an error to the user like "Please double check the domain of your email address..." 3) check for domains you know have moved. We were B2B, so if you watched your bounces closely, you'd know that asdf.com was moving to hjkl.com, so you could update your existing records, but people have serious muscle memory, and it's worth reminding them on the signup page.

I was working on tying in our bounce database (you are keeping a record of all your bounces, right?) so that automatically flagged domains would prompt the user with an error like "We've been unable to deliver to your email domain recently, if your email address is typed correctly, we recommend using a secondary email address if you have one..."