Hacker News new | ask | show | jobs
by ankhmoop 6197 days ago
Given that conforming with the standard is effectively free, do you have any other justification for your non-conformal position of "I'd say 64 is enough. Anything above is just weird"?

Willfully and capriciously ignoring standard requirements that you think are "weird" results in non-conformal implementations that confound users and other developers attempting to interoperate with your systems. I'm genuinely surprised to be writing a paragraph defending standards conformance -- I'd have thought that this position was basic common sense among software developers.

2 comments

I wrote a greylist server ( http://www.x-grey.com/ ) and I arbitrarily capped emails at 108 characters. In testing, I collected 565,012 tuples (IP address, sender address, recpient address)---the longest in that testing corpus was 107 characters (15), the average length was 24.24 characters and the median was 23 (just checked now). Capping at 108 meant I could store two addresses, plus a IPv6 address, plus some timestamps in 256 bytes of memory (one feature of my greylist implementation---everything is stored in memory).

Over the year and a half it's been running, I have seen a few addresses exceed the 108 character limit (which isn't fatal as I do store such addresses, only the first 108 characters) and by few I mean "less than 1% of 1%". Bumping the record size to store a full 254 bytes (or is it characters? There is a difference) would double the memory consumption of the program for very little gain in return (but at least I have numbers to back up my position).

Conforming with the standard is not "effectively free".

The only way to _really_ validate an email address is to try to send mail to it. But that has non-zero cost (depending on how often you have to do it, what the odds are that you'll end up on a spam blacklist for no good reason, etc., etc.).

The alternative is to use purely server-side validation routines. But these become more and more expensive as you progress through less common edge cases (e.g., regular expressions are not capable of detecting every valid address). So most people, sooner or later, make a trade-off, favoring some more common subset of cases over some less common subset.

If anything, we should be arguing over what constitutes an acceptable place to make that trade-off. Should embedded comments be supported? What about bang paths?

If you are not going to email to it, why bother asking/storing it?
Maybe you don't need to send email right away, but want to store the address in case you need to get in touch with the user?

Sending one email per signup can be problematic depending on the volume of signups. Sending email only when absolutely necessary can help with that.

If you really care, use a real standards-comformant address parser, most languages have at least one -- Java does. Otherwise you're just wasting your time, and the time of any users you hose with your amateur-hour validation.