Hacker News new | ask | show | jobs
by zAy0LfpBZLC8mAC 3573 days ago
> "Cannot be resolved" means NXDOMAIN.

OK, that at least shouldn't reject any valid addresses, so maybe ...

> Why assume email addresses only get checked in one place, and not all?

It's not an assumption, it's just a matter of simplicity and reliability.

> Ten million a day was a milestone. I left that company over a year ago; it would astonish me to find that figure now exceeded by less than a factor of twenty. Granted these are mostly not signups. They are outgoing emails nonetheless, which makes the case germane despite that superficial distinction.

Well, no clue how well common MTAs would cope with that, but 2500 mails per second could still be within the power of a single machine if it's designed for high performance. But regardless, I don't think that really matters: If you have a relatively low volume of emails, it's probably most efficient and secure to handle it all with one central outbound relay, if you need to send lots of emails, it obviously makes sense to distribute the load, but that shouldn't really otherwise change the strategy.

> Your proposed solution sounds pretty expensive in ops resource, to no obviously greater benefit than the rather simple (well under one dev-day) option we chose. You seem to feel yours is strongly preferable, but I still don't understand why.

What sounds expensive about it?

Whether there is any benefit to it: Well, depends on your goals and what exactly your solution actually does. I still don't understand why (or even how exactly) you do those RFC1918 checks, for example?! It seems like it's mostly a security measure? But then, it's not actually secure, it's essentially a race condition/TOCTTOU. Plus, it might even break valid email addresses.

Essentially, it's four factors why I think just delegating the validation of email addresses to the mail server is the best strategy:

1. Implementing your own checks risks introducing additional mistakes (which might lead to the rejection of valid addresses).

2. Implementing your own checks is additional work when you could just use the MTA which already knows how to do this (and which you have to install/configure/use anyway as soon as you want to actually use the email address), both for the initial implementation, and possibly for subsequent maintenance (if you just let your MTA do the work, only the MTA needs to be adapted to any changes in how emails get delivered, abstracting away the problem for any software that's supposed to be sending emails and isolating it from the lower layers).

3. My approach actually gives you the perfect result, in that it does not reject any valid addresses (assuming your MTA implements the RFCs correctly), and at the same time is perfectly secure against all possible abuses with weird addresses, which is impossible to achieve when you separate the check from the actual abuse scenario, and it's even kindof trivial to see that that is the case.

4. You have to have the infrastucture to deal with bounces anyhow, both because you ultimately cannot be sure an address actually exists unless you have successfully delivered an email to it, and because addresses that once existed might not exist anymore at a later time, so it's not like you can avoid that if only you validate your addresses better.

As for "ops resources": Assuming that any service that needs to send > 10 million mails per day will be deploying machines automatically anyway, what's so much more expensive about deploying the configuration of an additional network namespace? Writing that script certainly shouldn't be more than a day of work either, should it?

1 comments

I may have erred in giving the impression that the application-level checks are the only line of defense here. They're not. The (bespoke) MTA underlying this product performs most if not all of these checks as well. I didn't really spend any time on that side of the business, so I might be wrong about that, but it would be something of a surprise. I do know our analytics needed to be able to cope usefully with an astonishing panoply of bogosity warnings that came back from the MTA, but I no longer recall exactly what they covered. And, in any case, it's nice when you can to tell the user "hey, this isn't deliverable" before it gets to the point of a bounce.

Checking whether an email address's domain-part is an RFC1918 IP is actually pretty easy. Split the address by '@'. The last piece is the domain part. Split it by '.'. If there are four pieces, all of which meaningfully cast to integers, treat it as an IP address. (Otherwise it's a domain name, which is fine as long as it has more than one part and isn't NXDOMAIN when the backend tries to resolve it.) If any part is negative or greater than 255, it's invalid. If it starts with [10] or [192, 168], or if it starts with [172] and the second part is between 16 and 31 inclusive, it's an RFC1918 address. Otherwise, it's fine.

Even with unit tests, that takes almost no time to write, and when your frontend and backend share a language as ours did, you can use the same logic both places. Node gives you a name resolver binding for free. I really can't imagine it being as quick to write, test, validate, and roll out a change to the MTA node manifest.

> Checking whether an email address's domain-part is an RFC1918 IP is actually pretty easy.

What I still don't understand: Why do that at all? What's the point of this? Security? Preventing user errors? What else?

Little of both.

We were pretty sure it would already be impossible, or nearly so, for a malicious user to probe our infrastructure this way, but when it's so simple to be even more sure, why not?

Similarly, we'd already observed a low but nonzero rate of users inadvertently providing such addresses - not during signup or onboarding so much, but in recipient lists they submitted. Since we used the same recipient checking code everywhere, why not cut that back to zero, too?