Hacker News new | ask | show | jobs
by cmehdy 1924 days ago
This stuff is not really well made for normal people, to be honest. Just look at all the discussions and troubles (tickets, misunderstandings, security risks) related to email and hyperlink parsers..

It took me a while to know that FQDNs can (and sometimest must?) start at root with a period, meaning every address you've ever typed could have finished with a period (news.ycombinator.com.) and I recall some newspaper (NYT? News Yorker?) failing to test for that when people want to bypass their paywall. And this is a valid email address apparently: #!$%&’*+-/=?^_`{}|~@example.com

RFCs/codified norms by tech people are just weird to normal people.

3 comments

Please stop downvoting this. If not an unpleasant truth, it's at least a widely held perception, which must have a reason. (And I suspect that reason is because it's true ...)

> this is a valid email address apparently: #!$%&’*+-/=?^_`{}|~@example.com

If so, that's actually the same as #!$%&’*@example.com (mail user 'foo+bar' is the same as 'foo'). Many webforms/DBs don't know that.

> If so, that's actually the same as #!$%&’*@example.com (mail user 'foo+bar' is the same as 'foo'). Many webforms/DBs don't know that.

Actually, no. To the best of my knowledge (and I'd be delighted to be corrected!), that's merely a convention that lots of providers (including GMail) conform to, but it's not part of the RFC or standards.

Don't get me wrong - it irritates me when that very-common behaviour isn't supported (and, at the very least, `+` shouldn't be considered an illegal character). But it's also technically-not-wrong to consider `a+1@test.com` as different from `a@test.com`.

It's explicitly called out in RFC5233, at least: https://tools.ietf.org/html/rfc5233
TIL, thank you!
You are right. In fact, RFC 5321 specifically forbids you from interpreting the local part of an address in any way.

> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

See your sibling comment for another perspective! (EDIT: which, to be clear, doesn't invalidate your point. Though it's worth considering, I guess, whether "only assigned semantics by the host specified in the domain" prevents user-tracking systems from calling "foo+bar@gmail.com" the same user as "foo@gmail.com". After all - if they're being interpreted "as" user IDs, rather than as emails, does that really breach the RFC?)
It's not really a different perspective. Sieve, which the sibling comment's RFC extends, is a mail-filtering script language for end-user inboxes. So it's perfectly reasonable for a user on foo.com, who knows that foo.com supports the `+` syntax, to write a Sieve script directing mail to "username+blah@foo.com" to a particular inbox.

In fact, that RFC specifically calls out that interpreting the `+` on non-local addresses is likely wrong:

> NOTE: Because the encoding of detailed addresses are site and/or implementation specific, using the subaddress extension on foreign addresses (such as the envelope "from" address or originator header fields) may lead to inconsistent or incorrect results.

EDIT to address your second point:

> After all - if they're being interpreted "as" user IDs, rather than as emails, does that really breach the RFC?

Well, technically no, the RFC is about SMTP so if you're not writing an SMTP implementation, you're not breaching it.

But RFCs aren't the law, so whether you're technically breaching it isn't really what's relevant. What _is_ relevant is that a system that treats foo+bar@quux.com the same as foo@quux.com is making assumptions about how email works that contradict the RFCs that define how email works. Whether that's a useful thing to do in practice is an engineering decision with tradeoffs. E.g., it's probably fine to assume it for a whitelisted set of domains where you know it to be true, like gmail.

This root period was mentioned on reddit a while ago because the domain "youtube.com." would fail to serve ads.

https://www.reddit.com/r/webdev/comments/gzr3cq/fyi_you_can_...

> I recall some newspaper (NYT? News Yorker?) failing to test for that when people want to bypass their paywall.

For a long time I could access Bloomberg for free because they failed open when you did this