Hacker News new | ask | show | jobs
by jwecker 3574 days ago
An email address isn't a document though, it's a routing command. I don't mean sanitization in the sense of inserting backslashes. I mean sanitization in the sense of "we don't allow people to set their email address to a mailbox on localhost at our mail server."
2 comments

1. Sanitization generally means changing information. As in, "removing bad characters", that kind of stuff. That's different from validation, which should result in rejection of bad input, and which can be perfectly fine. However, more often than not, validation is implemented badly and rejects perfectly fine input, which is why validation shouldn't be employed more than necessary either.

2. Rejecting @localhost addresses doesn't really make a whole lot of sense. People could just enter the public IP address or hostname of the server, or add a DNS A record under their own domain that points to 127.0.0.1, or an MX record that points to localhost, or any number of other weird stuff that you could not possibly validate anyway (if only because it could be changed at any point lateron). Just configure your mail server properly and then send the damn email, and if it does get sent to root@localhost, and possibly forwarded to the admin--so what? People obviously could just sign up using your admin's email address anyway, and that not only at your site, but at millions of sites out there, you won't be able to stop them. There is nothing particularly dangerous about receiving unsolicited signup emails or about sending emails to yourself.

> There is nothing particularly dangerous about receiving unsolicited signup emails or about sending emails to yourself.

Depends on what you do with them, in the latter case. There could be an amplification attack there.

Validating domain parts to a certain extent isn't a bad idea, at least as far as non-routable domain names and RFC1918 ranges go. I've seen this done (actually implemented some of it, in fact) at a past employer, who were basically looking to cover the 90% case in terms of not getting hosed by a trivial attack. It doesn't take much effort and it makes 4chan's life harder. What's not to like?

> Depends on what you do with them, in the latter case. There could be an amplification attack there.

Hu? How would that work?

> Validating domain parts to a certain extent isn't a bad idea, at least as far as non-routable domain names and RFC1918 ranges go.

What do you mean by "non-routable domain names" and what do you gain by checking for RFC1918 ranges?

> I've seen this done (actually implemented some of it, in fact) at a past employer, who were basically looking to cover the 90% case in terms of not getting hosed by a trivial attack.

Why did you prefer that approach over a robust solution?

My idea of a robust solution: Have one central outbound relay that's firewalled off from connecting to anywhere but the outside world, make all servers that need to send email use that relay as a smarthost (so they never connect to anything but that relay, regardless what the destination address is), use TLS and SMTP AUTH with credentials per client server to prevent abuse of the relay by third parties.

> What's not to like?

(a) that it's a lot easier to build a solution that's more robust, (b) it's extremely likely that your implementation is buggy, thus rejecting valid addresses, and (c) it's causing a maintenance burden (what happens when the first people drop IPv4 for their MXes? I'd pretty much bet that you don't check for AAAA records, so you'd probably suddenly start rejecting perfectly fine email addresses, thus making the transition to IPv6 unnecessarily harder, am I right?).

'Non-routable' as in a single label, or as in not resolvable. I don't think it is unreasonable to consider an address invalid when its domain part cannot be resolved. Checking for RFC1918 ranges means you don't try to send to another class of addresses that's never going to be received.

You would lose the bet. The product supported IPv6 from day one.

That is a robust, if somewhat complex, solution for a relatively small volume of mail. When you're sending ten million messages a day by the end of the first month, pushing everything into a single relay of any kind is asking for a lot of trouble.

> 'Non-routable' as in a single label, or as in not resolvable. I don't think it is unreasonable to consider an address invalid when its domain part cannot be resolved.

What exactly do you mean by "cannot be resolved"?

> Checking for RFC1918 ranges means you don't try to send to another class of addresses that's never going to be received.

But why check for it? Is that actually a common mistake people make? An attacker could change the address after you checked it, so it's not going to help against attackers, is it?

> You would lose the bet. The product supported IPv6 from day one.

Good for you! :-)

> That is a robust, if somewhat complex, solution for a relatively small volume of mail. When you're sending ten million messages a day by the end of the first month, pushing everything into a single relay of any kind is asking for a lot of trouble.

Complex? Certainly less so than implementing validation yourself.

As for scalability: Well, yeah, as described that's more the setup for a company that's operating various different services, none of which has a high volume of outbound email (which would be most, even the best startups don't have ten million signups per day and don't send much email otherwise, and even that should actually still be managable with a single server).

But that's trivial to adapt without changing the general approach. First of all, obviously, you could just add more relay servers and have client servers select one randomly, that scales linearly. But if you really need to move massive amounts of email for one service, so that adding an additional relay hop for each email you send actually adds up to noticable costs, you can still use the same approach: Just put the MTA onto the same machine(s) that the service is running on, into its own network namespace (assuming Linux, analogous technology exists on other platforms), and firewall it off there so it cannot connect to your internal network. Potentially you can even just add blackhole routes for your internal networks/RFC1918 ranges, so you would not even need a stateful packet filter (though currently you might still need it due to IPv4 address shortage).

"Cannot be resolved" means NXDOMAIN.

Why assume email addresses only get checked in one place, and not all?

Ten million a day was a milestone. I left that company over a year ago; it would astonish me to find that figure now exceeded by less than a factor of twenty. Granted these are mostly not signups. They are outgoing emails nonetheless, which makes the case germane despite that superficial distinction.

Your proposed solution sounds pretty expensive in ops resource, to no obviously greater benefit than the rather simple (well under one dev-day) option we chose. You seem to feel yours is strongly preferable, but I still don't understand why.

Then reject email addressed to localhost. It shouldn't matter how the email got there. I'd suggest that especially given DNS trickery involving setting up a low TTL then redirecting to 127.0.0.1, you're probably not preventing this from happening or you'd have to invalidate any unrecognised domain. Better to solve that problem at a different layer -- validate the email by sending a validation link if you must...
True, but that's my point- it's a backend issue, not a front-end "help the user" issue.
And their point is, it's a backend issue, the backend being the mail client/server that already completely handling the sending/receiving of emails. Either the activation link gets clicked or it doesn't. The click is the only correct validation, and yes, the whole process happens on the "backend".