Hacker News new | ask | show | jobs
by ableal 1924 days ago
Please stop downvoting this. If not an unpleasant truth, it's at least a widely held perception, which must have a reason. (And I suspect that reason is because it's true ...)

> this is a valid email address apparently: #!$%&’*+-/=?^_`{}|~@example.com

If so, that's actually the same as #!$%&’*@example.com (mail user 'foo+bar' is the same as 'foo'). Many webforms/DBs don't know that.

1 comments

> If so, that's actually the same as #!$%&’*@example.com (mail user 'foo+bar' is the same as 'foo'). Many webforms/DBs don't know that.

Actually, no. To the best of my knowledge (and I'd be delighted to be corrected!), that's merely a convention that lots of providers (including GMail) conform to, but it's not part of the RFC or standards.

Don't get me wrong - it irritates me when that very-common behaviour isn't supported (and, at the very least, `+` shouldn't be considered an illegal character). But it's also technically-not-wrong to consider `a+1@test.com` as different from `a@test.com`.

It's explicitly called out in RFC5233, at least: https://tools.ietf.org/html/rfc5233
TIL, thank you!
You are right. In fact, RFC 5321 specifically forbids you from interpreting the local part of an address in any way.

> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

See your sibling comment for another perspective! (EDIT: which, to be clear, doesn't invalidate your point. Though it's worth considering, I guess, whether "only assigned semantics by the host specified in the domain" prevents user-tracking systems from calling "foo+bar@gmail.com" the same user as "foo@gmail.com". After all - if they're being interpreted "as" user IDs, rather than as emails, does that really breach the RFC?)
It's not really a different perspective. Sieve, which the sibling comment's RFC extends, is a mail-filtering script language for end-user inboxes. So it's perfectly reasonable for a user on foo.com, who knows that foo.com supports the `+` syntax, to write a Sieve script directing mail to "username+blah@foo.com" to a particular inbox.

In fact, that RFC specifically calls out that interpreting the `+` on non-local addresses is likely wrong:

> NOTE: Because the encoding of detailed addresses are site and/or implementation specific, using the subaddress extension on foreign addresses (such as the envelope "from" address or originator header fields) may lead to inconsistent or incorrect results.

EDIT to address your second point:

> After all - if they're being interpreted "as" user IDs, rather than as emails, does that really breach the RFC?

Well, technically no, the RFC is about SMTP so if you're not writing an SMTP implementation, you're not breaching it.

But RFCs aren't the law, so whether you're technically breaching it isn't really what's relevant. What _is_ relevant is that a system that treats foo+bar@quux.com the same as foo@quux.com is making assumptions about how email works that contradict the RFCs that define how email works. Whether that's a useful thing to do in practice is an engineering decision with tradeoffs. E.g., it's probably fine to assume it for a whitelisted set of domains where you know it to be true, like gmail.