Hacker News new | ask | show | jobs
by ivanca 3578 days ago
You were claiming "they could just as well package into, say, an install CD image" which is likely not going to happen, so he was just giving an example.

The perfect spam filter Gmail has its probably based on data and ML analysis, not a simple software you can put on a CD and install on some little server; so even if someone who works there wanted to help you they probably can't do it.

3 comments

GMail's "perfect spam filter" is so bad that it is the reason my institution moved away from Google services. It's also the main reason so many people consider it 'impossible' to host your own email.

It's anything but perfect. It's a black box, with no user-serviceable parts inside. Completely useless for any organization with needs that deviate from Joe User.

How would you implement a spam filter?

You can sort of turn off spam filtering on incoming emails but I don't most people or organizations would want that.

> Completely useless for any organization with needs that deviate from Joe User.

Maybe but being slightly over aggressive means a world in terms of user happiness. Think about email before Gmail. How much junk did you see in your inbox? There is a lot LESS today. Joe User would be very happy if they dried to think about it for a second.

> How would you implement a spam filter?

What do you mean "implement"? You are aware that there exists free spam filtering software that you can install, like, say, spamassassin (which has been around longer than gmail)?

> Maybe but being slightly over aggressive means a world in terms of user happiness.

Dropping legitimate emails is good for user happiness? You have strange users. I imagine if the post office decided to throw away some letters "slighty over aggressive[ly]" ... there are people who would be happy if that happened?

> Think about email before Gmail. How much junk did you see in your inbox? There is a lot LESS today.

I see maybe one to three spam mails per day, with nearly zero false positives (maybe one per year). And that's without gmail. Obviously, I could reduce that to zero spam, at the cost of increased false positives, which is not a sensible option as far as I am concerned, deleting three mails per day isn't really all that much work.

We could probably build a distributed version of a spam filter by using a fancy online (single sample batch) training algorithm and combining models. I'm tired, so that exact solution may be infeasible, but I'll bet something similar is.
> You were claiming "they could just as well package into, say, an install CD image" which is likely not going to happen, so he was just giving an example.

What's the point of pointing out the one example among millions of email server setups that is most likely to not be published? Especially so, given that that would be about the most useless setup to replicate, as noone needs the scalability of gmail for their own setup, and thus the complexity that would come with it.

> The perfect spam filter Gmail has its probably based on data and ML analysis, not a simple software you can put on a CD and install on some little server; so even if someone who works there wanted to help you they probably can't do it.

If you think that gmail's spam filter is perfect, I have a very simple even more perfect spam filter: Just throw away all emails. Gmail has massive false positive rates, that's not a perfect filter, that's just a filter that throws away a lot of emails.

If anything, the control that google has over which emails it arbitrarily labels as spam is completely unacceptable, especially so given that they don't even accept liability for incorrectly filtered emails.