Thats propably because you are looking inlined assembly of definition. If you name and reuse all the partial patterns it become much more clear. Though regex is cool obfuscation method.
It would be than at least 3/4 of a page long I guess.
The fun part is this is not even the full truth. As the list of TLD isn't very static any more it's additionally difficult to determine whether a host name is valid. That is only possible with some dynamic list (or a regex that would grow indefinitely and ever change). The presented solution doesn't even take this into account.
The source page I've linked is a quite interesting read on that whole topic.
You propably would want to reuse referenced definitions like domain and IP which are not email specific. But yes all of our JS could be much shorter if we used APL but most of us like readability :P
I kind of not get why TLD should be validated. Does it matter anymore than if sub domain is not registered of if IP is not reachable. I think valid as potentially deliverable and actually deliverable should be distincted (like well formed XML and schema validated XML).
The TLD part matters as some part of the email format is defined through the format of a valid host name. "something.com" is a valid host name, but "something.something" isn't currently a valid host name. So an email address "something@something.something" isn't a valid email address (currently).
But at the end of the day this is all moot, imho. The "only" sane test to check the validity of an email address when someone shows you one is whether you can successfully deliver mail there.
Because even an address is formally valid doesn't mean it will get accepted by all systems on it's way. Almost nobody follows the under specified, confusing, and contradictory specs to the letter.
That was my point in the first place: Trying to validate email addresses is a rabbit hole. It's for sure everything, but not "simple", as claimed above.
The point I was making is that whether or not you can successfully deliver email is not a sensible test of the validity of an email address, looking at the address purely as data. As I pointed out, my email archive contains many email addresses that are no longer ‘valid’ by your definition, but they are still valid as data.
By your definition email address validity changes literally on a moment to moment basis. Addresses are becoming invalid constantly and new ones are becoming valid constantly. It’s not a useful definition of validity, and not even something you can test meaningfully.
I've got your point already before and I think it's valid.
That's why I've formulated my "definition" carefully:
> the validity of an email address when someone shows you one
It's of course not a "definition" someone could write down into a spec. But It's by far the best "informal validity check" in practice. It checks whether an email address is currently valid. You practically can't do more anyway!
The "formal validity" of an email address changes with time nowadays as I've pointed out: It depends directly on the formal validity of the host name part which can change over time given the fact that the list of TLDs changes over time (which wasn't the case at the time those specs have been written; fun fact: there is more than one spec, and they're contradicting each other).
To add on that there are two more important aspects: Firstly an email address you can't send mail to is mostly worthless in practice as it can't be used for its primary purpose. Secondly even perfectly "valid" addresses (by the spec) aren't accepted by a lot of parties that claim to handle email addresses! I guess a lot of systems would for example refuse an address looking like "-@-", wouldn't they? But it's perfectly valid!
My initial argument was that claiming that it's "easy" to validated email addresses is wrong in multiple dimensions. In fact it's one of the more complicated questions out there (given the tragedy of the specs).
The fun part is this is not even the full truth. As the list of TLD isn't very static any more it's additionally difficult to determine whether a host name is valid. That is only possible with some dynamic list (or a regex that would grow indefinitely and ever change). The presented solution doesn't even take this into account.
The source page I've linked is a quite interesting read on that whole topic.