Hacker News new | ask | show | jobs
by theandrewbailey 3031 days ago
Sorry, but .+@.+ isn't going to cut it if you want to confidently accept deliverable email addresses. Regex valid, but not email valid:

codetrotter@example

code@trotter@example.com

code trotter@example.com

codetrotter@example..com

codetrotter@example.com.

.codetrotter@example.com

My company runs a website that has elderly people signing up for newsletters. The client is paranoid about not getting every last drop of possible data, and raises hell for every email address that isn't deliverable. There are LOTS of ways to easily ruin a simple email address.

5 comments

There is a other email that the regexp lets pass and that’s still not valid:

codetrotter@example.com

It conforms to the expectated format and could be a valid email, but it’s actually not because no such user exists.

An email might also exist, but not accept mail from you.

The given email address might exist, by could belong to another user.

There’s a million things that can go wrong and you’ll have a very hard time catching them.

The only way to identify if an email is valid and accepts mail is to actually send an email there.

You can, as a help for the user identify odd looking email addresses and flag them in the UI (“this looks unusual, are you sure”), but generally speaking, chances are high that any strict validation will reject real world addresses while not catching all errors.

Good points. Why do regex tests at all if emails could fail in any number of ways?

If you're going to test for at least 3 characters with a @ in the middle, you probably should implement some other simple rules to have a snowball's chance on the internet:

only one @

no spaces

has TLD (guarantee at least one period after @, and something else after, no consecutive periods)

can't begin or end with a period

Your “has TLD” test is already wrong: localpart@tld (example@de) is an odd, but valid address.
A (very simple) regex text might exclude some randomly-entered garbage or inadvertantly invalid address. Even then, the scope and reliability of such tests is minuscule.
It appears you're missing the main point of the parent post.

You cannot validate an email address.

You can make a basic excruciatingly simple test for proper form. And should probably limit your checks to that.

For all else, attempt to use the email address provided for validation within your onboarding loop with a sufficiently unique verification URL or code. If that succeeds, the address is ... still not absolutely certainly valid, as it may have gone to a third party who proceeded to verify it. But at least it delivered to somebody.

See: https://hackernoon.com/the-100-correct-way-to-validate-email...

An email address being syntactically valid is no guarantee that the mailbox even exists or is correctly mapped to the person at the keyboard!
If you want to "confidently accept deliverable email addresses" the only solution is to send a test message out with a link that the recipient has to visit to validate that the email address exists. Otherwise what about regex and email valid addresses that still don't exist?
codetrotter@example is a valid email
I should have been more specific: an email address that is routable over the internet. Where's the TLD on that?
What section of what standard says a TLD can't have an email server on it? Is AAA not allowed to host an email server on `aaa`[1], and have the email `sales@aaa`?

[1]: And "aaa" is a valid TLD; see the full list: https://www.iana.org/domains/root/db ; now, perhaps it is required to at least have a second level domain, but that's what I'm asking: is an MX record invalid on a TLD?

It’s actually a valid email and I’ve seen examples of such emails, but can’t remember the exact specifics.