Hacker News new | ask | show | jobs
by derefr 3402 days ago
* Gmail's search and spam-filtering are both very good, trained and tuned on datasets no self-hosted product could ever match (and harnessing parallel algorithms across large clusters that'd be quite costly on one machine)

* Google doesn't lose my email; I probably will lose my email, because an email server is backed by a database and doing database backups right is hard if that is not your day-job.

* You can get a good email-receiving experience, but email-sending is very difficult these days if you're a nobody, because a lot of first-stage network-level spam filtering has come down to reputation, and your server IP won't have any (or, if it's a cloud provider IP, will have very likely been used at least once to send spam in the past.) And residential ranges get dinged, too, from the heuristic (stereotype) that the most likely reason to get an SMTP connection from a residential IP is that it's a member of a botnet.

5 comments

> Gmail's search and spam-filtering are both very good, trained and tuned on datasets no self-hosted product could ever match (and harnessing parallel algorithms across large clusters that'd be quite costly on one machine)

As someone who self hosts, this is clearly not true. With gmail I was receiving a lot of spam from various email marketing companies like mailchimp, easymail, etc. There's a lot of these companies and they are mostly country specific, some less, some more shady.

With self hosting it is easy to block their servers en masse and forget about them. Some companies spam the DNS namespace with predictable, but extremely numerous domain names, which are easy to block using a few regular expressions. Try to make filters in gmail for that, if you don't know from which of the 100 domains the next email will come.

Email from hacked servers is also easy to block. It's mostly PHP servers and all you need to look for is mention of eval() in the headers as nobody sane hopefully evals PHP code to send email.

It just took me a month of spending a few minutes every other day analyzing headers of odd email or two which passes through some generic checks like checking if sending IP address has a domain name and figuring out how to block the sender entirely if possible.

Now I don't get any legitimately looking spam at all and what I get is easily filtered with bayes filter in thunderbird.

Anyway, with spam the hard job is checking the spam folder and that's annoying as hell with gmail, because it's always full of crap, and it's not easy to see occasional false positive. Now I only get 1 spam every two to three days and that's easy to check. Legitimate people who get blocked get bounce message immediately and have chance to re-send according to instructions in the bounce, instead of falling into spam folder and feeling ignored.

Much better experience overall.

Actually what is hardest to filter is bounces from gmail servers. I'm not really sure how spammers generate them. They are not in response to anything that I send. It seems like google ignores my SPF records, even though it indicates that it found that the sender forged the From header and sends me the bounce with attached spam that is targeted at me anyway. Quite annoying.

EDIT: I guess I can just reject the gmail bounce if it contains the "Received-SPF: fail (google.com:". Ah!

I agree. I have been self-hosting my personal email since 1998, and there was a period when this was difficult due to technical issues related to encryption. But for the past decade or so those issues are gone. The benefits are great. For example, being able to block entire netblocks at the routing or firewall level is an amazing anti-spam tool that is completely free when you self-host.
How do you deal with 'important mass notices' from your utility company or bank? Do they only use their own email servers?
> Google doesn't lose my email; I probably will lose my email, because an email server is backed by a database and doing database backups right is hard if that is not your day-job.

I use IMAP email. My email is simultaneously stored on my server and on every client. If the server is nuked, I can set up a new IMAP server elsewhere and sync my email client to it; I'd want to do this from work where I have gigabit internet, or this would take a while, but it can re-upload all the data to the server.

That said, I'm using a managed account. I'm not communicating about anything that I care if the government subpoenas, and I have no plans to.

Unless we end up in a totalitarian state where constructive criticism of the government becomes an offense. But in that case my public posts would be more than enough to convict me without looking at my emails.

> ...because an email server is backed by a database and doing database backups right is hard if that is not your day-job.

I store my email on dovecot with Maildir storage. For a single or just a few accounts is perfectly fine and you can backup the emails with your favorite backup tool.

I spent 2 years trying to get them to understand that alerts from my credit card company were not spam before finally giving up and moving off Gmail. I am very happy to be done with their spam-filtering.
I use filters in such cases – they have worked fine for me so far.
> Gmail's search and spam-filtering are both very good

Google/Microsoft's spam filtering makes it impossible to send e-mail from a self-hosted solution.

http://penguindreams.org/blog/how-google-and-microsoft-made-...

Unless you're sending out thousands of e-mails per day and build your reputation with their magic-goo trust filter algorithm, you cannot run your own e-mail server and run with the big players. They have made self-hosted e-mail totally unreliable.

I think what you meant by "very good" is "piss fucking terrible."

Not for my use-case. There's basically nobody self-hosting email that I want to receive emails from. It turns out the egalitarian "Everyone is an Internet admin" solution favored the spammers heavily over the technocrats or common users; letting Google build a system that defaults to trust-off for self-hosting proved to be valuable for a lot of people.

(Because if a tech-savvy user really wants to email me, they know how to make a throwaway email account and sign the correspondence with a verifiable PGP key).

I haven't had any particular issues getting past spam filters, it certainly takes some time to build IP reputation but in general with nothing more than SPF and RDNS properly configured my mails get through. I really should get DKIM/DMARC working eventually, but my current email solution (GroupWise) doesn't support it natively so I'll have to do some nonsense for that..
I had this problem self hosting but was able to remediate it by making sure my server was doing all the smart modern things like dmarc etc... there are some good resources on HN from others who've set up all the right things.

Of course, this all happened after I got bitten during a job search and had most of my applications hit spam folders ಠ_ಠ

If you read the post I linked, I have the correct DMARC, SPF and DKIM records and signatures happening. If I send them to my old University (google) account, I see all that get verified and correct. It doesn't really help.

I suspect part of it might be that it's on a Linode and might be sharing a subnet with other spammy machines. That's probably why MailChimp owns a class C and refuses to sell any of it.

Interesting. I am on Linode as well, that sucks.

Do you host an https site on the same domain? Is your mail server responding to ipv6? (I hear this can be a problem)

Can you recommend a good resource for "how to set up your mail server like it's 2017" for those of us who would like to self-host but don't want to spend 6 months figuring it all out?
From recent memory this post covers all the stuff I did with my server: https://news.ycombinator.com/item?id=11946756

There are a lot of testing tools you can run mail through as well to see how well you score.

You may find this interesting:

https://mailinabox.email/