Email validation regexes are so annoying. Everyone ought to just use .+@.+ as their validation regex and not be more strict than that.
Beyond that just queue and try to deliver the email. Tell the user than an email should arrive shortly and that if it doesn’t they should check their spam folder and that they should check that they gave the correct email address. When you say this you repeat the email address that the user gave you (escaped for XSS of course).
I think some people “validate” against a strict pattern to keep their users from mistyping, but really there are so many ways to make a typo and still match those regexes that IMO it’s pointless to use a complicated regex and 80% of the times those regexes end up rejecting actually valid (though unusual) email addresses.
I think for a lot of developers the reason they do this is that they’ve learned that they should validate data and so they decide to validate email and to do so they either copy-paste some random-ass regex off the internet or they write their own broken regexes.
All your regex should do is to ensure that there is an @ in the address and that there is something before and something after. This keeps people from mistakenly entering say for example their phone number because they didn’t read what the field was for.
To prevent people from making your machine send your emails where it should not, such as to root@localhost of your server or elsewhere on your local network (don’t know why anyone would and also it wouldn’t be a big issue, just a tiny bit annoying), is a server configuration concern. Specifically, a concern of configuration of the email server software and of your firewalls.
User presses sign up -> Send then to registration form, they fill in their details which you validate lightly client side, they submit -> You validate lightly server-side and either send them back to the form or on to the next step -> You tell them “Thank you, your registration is now complete. An email should arrive in your inbox shortly. If it does not, please check your spam folder and also control that you entered your email address correctly. The email address you gave us was somebody@example.com.”
Sorry, but .+@.+ isn't going to cut it if you want to confidently accept deliverable email addresses. Regex valid, but not email valid:
codetrotter@example
code@trotter@example.com
code trotter@example.com
codetrotter@example..com
codetrotter@example.com.
.codetrotter@example.com
My company runs a website that has elderly people signing up for newsletters. The client is paranoid about not getting every last drop of possible data, and raises hell for every email address that isn't deliverable. There are LOTS of ways to easily ruin a simple email address.
There is a other email that the regexp lets pass and that’s still not valid:
codetrotter@example.com
It conforms to the expectated format and could be a valid email, but it’s actually not because no such user exists.
An email might also exist, but not accept mail from you.
The given email address might exist, by could belong to another user.
There’s a million things that can go wrong and you’ll have a very hard time catching them.
The only way to identify if an email is valid and accepts mail is to actually send an email there.
You can, as a help for the user identify odd looking email addresses and flag them in the UI (“this looks unusual, are you sure”), but generally speaking, chances are high that any strict validation will reject real world addresses while not catching all errors.
Good points. Why do regex tests at all if emails could fail in any number of ways?
If you're going to test for at least 3 characters with a @ in the middle, you probably should implement some other simple rules to have a snowball's chance on the internet:
only one @
no spaces
has TLD (guarantee at least one period after @, and something else after, no consecutive periods)
A (very simple) regex text might exclude some randomly-entered garbage or inadvertantly invalid address. Even then, the scope and reliability of such tests is minuscule.
It appears you're missing the main point of the parent post.
You cannot validate an email address.
You can make a basic excruciatingly simple test for proper form. And should probably limit your checks to that.
For all else, attempt to use the email address provided for validation within your onboarding loop with a sufficiently unique verification URL or code. If that succeeds, the address is ... still not absolutely certainly valid, as it may have gone to a third party who proceeded to verify it. But at least it delivered to somebody.
If you want to "confidently accept deliverable email addresses" the only solution is to send a test message out with a link that the recipient has to visit to validate that the email address exists. Otherwise what about regex and email valid addresses that still don't exist?
What section of what standard says a TLD can't have an email server on it? Is AAA not allowed to host an email server on `aaa`[1], and have the email `sales@aaa`?
[1]: And "aaa" is a valid TLD; see the full list: https://www.iana.org/domains/root/db ; now, perhaps it is required to at least have a second level domain, but that's what I'm asking: is an MX record invalid on a TLD?
I think what Mashimo means is that Infoteam.ch thinks that `+` is invalid when it isn't. Sadly they're far from the only ones whose email validation code won't accept the plus character, or many other legal characters in the local part of an email address.
Beyond that just queue and try to deliver the email. Tell the user than an email should arrive shortly and that if it doesn’t they should check their spam folder and that they should check that they gave the correct email address. When you say this you repeat the email address that the user gave you (escaped for XSS of course).
I think some people “validate” against a strict pattern to keep their users from mistyping, but really there are so many ways to make a typo and still match those regexes that IMO it’s pointless to use a complicated regex and 80% of the times those regexes end up rejecting actually valid (though unusual) email addresses.
I think for a lot of developers the reason they do this is that they’ve learned that they should validate data and so they decide to validate email and to do so they either copy-paste some random-ass regex off the internet or they write their own broken regexes.
All your regex should do is to ensure that there is an @ in the address and that there is something before and something after. This keeps people from mistakenly entering say for example their phone number because they didn’t read what the field was for.
To prevent people from making your machine send your emails where it should not, such as to root@localhost of your server or elsewhere on your local network (don’t know why anyone would and also it wouldn’t be a big issue, just a tiny bit annoying), is a server configuration concern. Specifically, a concern of configuration of the email server software and of your firewalls.
User presses sign up -> Send then to registration form, they fill in their details which you validate lightly client side, they submit -> You validate lightly server-side and either send them back to the form or on to the next step -> You tell them “Thank you, your registration is now complete. An email should arrive in your inbox shortly. If it does not, please check your spam folder and also control that you entered your email address correctly. The email address you gave us was somebody@example.com.”