Hacker News new | ask | show | jobs
by Invisig0th 3180 days ago
The article you link to addresses just the one use case where you are registering a new user. Email addresses in business applications are just as often NOT the address of your user. Sales contacts, managers, points of contact, customer support addresses, etc. -- none of these should ever be validated by sending an e-mail. So you still need to validate the hard way in plenty of scenarios.
1 comments

...but unless you actually send an email it is still just as unvalidated as it was before. Regexes aren't a tool to determine that a string refers to a mailbox capable of receiving mail.

If you use regexes (or any other method of that does not send emails) all you're saying is that you don't actually care whether or not the string points to a recipient (much less the correct recipient).

And don't get me wrong. It is absolutely okay to not care whether or not the string refers to a correct recipient. Most places that make me write my email have no business caring about it. But please also then make the field optional.

Regex validation cannot verify that an email address is functioning any more than it can verify that a phone number is functioning. But if an address or a number passes formatting checks, one can indeed consider that data "validated". That's what that term means in the business software development industry. We're verifying if the data entered COULD be correct, not double-checking that it IS correct.

You are correct to say that those emails may still bounce, and the phone calls may also not go through. We completely understand that. For this reason, in very specific situations (like registering a new user), we do take that extra step to make sure the communication channel actually works. But there are plenty of situations where that makes absolutely no sense, and/or adds very little value for the cost. Knowing the difference between these two very different use cases certainly does not indicate that these people "don't actually care" about the accuracy of their data.

The point is the vast majority of regexes attempting to validate an email address will produce false negatives, reject valid email addresses. If a regex must exist, it should strip all whitespace characters first, confirm there is a single '@' with one or more characters before it, and two or more characters after it (the davidcel.is article mentions checking for a dot but a@us could be a valid address that would fail that test); it should not balk at character sets other than ASCII.

If you want to do some additional non-regex validation, like confirm the hostname exists and has an MX record, have at it.

And what I'm saying is that either the system relies on functioning email addresses, in which case they need to be ensured to work anyway. ...or the system does not rely on working email addresses, in which case drop the pretense, make it an optional field and peoples' days will suck less.

Besides the practical issues mentioned, it's this no-brained "why the heck not" collection of personal data I'm turning against. Either you need the stuff and then you have to work for it, or you dont and then you have nothibg to do with it.