Hacker News new | ask | show | jobs
by thaumasiotes 2616 days ago
I've had gmail send the following to the spam folder:

- Legitimate class action notices related to Amazon purchases.

- Email coming from addresses to which I had already sent email. (!)

- Email from my landlord.

- Email coming from Google itself.

Based on the contents of my spam folder, which I have to check fairly often because of the extreme overaggressiveness, I would be vastly better off if nothing ever got filtered at all. [1]

>> There are going to be false positives, we will make mistakes, but we certainly care a lot about fixing issues like this when we hear about them.

This doesn't sound honest, or at least not complete. People have been complaining about this for years. I have personally been complaining about this for years. The loss of obviously legitimate email is completely outrageous.

It doesn't look intentional (look at that fourth category!), but it certainly doesn't look like anyone is trying to address the problem.

[1] Yes, if spam filtering was disabled, more spam might get sent.

16 comments

Before I switched to fastmail, gmail would not believe me when I marked an email as not-spam. They would still get flagged. All I wanted was updates from one band.

That didn’t stop me from repeatedly getting junk from every other record label my email was sold to. It was an endless procession of shit I never subscribed to.

"Inboxing" with Gmail & Outlook is a baroque, hellish process for mailservers.

When either provider decides your small email server is sending spam (eg: sending an email with an attachment, or any kind of form email like a daily report) you won't get through to user inboxes, and instead you'll be routed to spam, or for Outlook.com hosted addresses they will accept mail from your server and send it to /dev/null. Gmail's process is bad, but Microsoft has decided to accept emails and throw them away (which is ridiculous).

At my workplace we have two email addresses, one from google mail (gmail for business) and one from MS Exchange.

I never had a problem with gmail for business regarding spam. I regularly receive mails from smaller businesses (some of them hosting their own mail server) and never had a complaint from anyone yet. Since i can also be contacted via phone i'd know.

On the other hand, MS Exchange constantly delivers obvious spam mails and (quite seldomly, but still) swallows legitimate mail.

Anectodal, i know. And disclaimer: the behavior depicted in the article is as bad as it gets, if everything is as described.

As far as I know, Gmail also does this sort of black hole spam filtering - the stuff that you see in "Spam" isn't everything.
> - Email coming from Google itself.

Yep. I missed an invitation to a Google-hosted event at a conference I attended because the email (from an @google.com address, no less) got caught in Gmail's spam filter.

This is a problem with ML approaches, right? Instead of water boiling at "100C" it boils at "99.98C +- 0.04C". Normally this is ok, but sometimes it isn't!
Denver is just a small minority of the US and we just can't please everyone.

(afaik in denver water boils at 95C)

In Denver water boils at 203F
If something happens a trillion times a day with .00001 error rate, then 50 MILLION things went wrong, affecting, say 50,000 people.

50 of them blast twitter and it seems the world is collapsing.

The world is resilient.

Sometimes error rates, even nominally low ones, are bad. Would you like to have your IV changed by a nurse with 0.001 error rate?
I imagine most humans have error rates worse than that. And what does 'error rate' even mean in that context? A small delay or a catastrophic failure ending in death and destruction?
I would think that would be quite a good error rate. Especially when considering people in the hospital are often not in good health, possibly making their veins more difficult to find.

That is assuming by error you mean missing the vein. If error is defined as a fatal complication, then 1/1000 is terrifying.

More importantly would you like to have your IV changed by a nurse that makes up her own standards for health care?

Google does what's best for Google - how could there be any discussion of that fact in 2019.

If there are 50,000 people and 50,000 people experience problems, that's bad.

If there are 5 million people and 50,000 experience problems, that's fine?

Isaac Asimov's comments about world population increase involved something about this; the more people there are, the more each individual is dehumanised and rendered irrelevant (my paraphrasing).

No I don't think it's fine at all. I think Google, twatter, Facebook, et al don't care because 50 people and who they represent don't matter to them compared to the money they make.

When all rounded up it isn't even a single penny on the balance sheet. The owners of these businesses literally never even know from the their only view into the companies.

I have no idea why I'm being downvoted on this. Hackers can't do math or what?

I think your position is a little unrealistic. 50 people experiencing problems out of what, a billion? is pretty good. Do you think that if those billion people were served by 20 million small business e-mail providers, that none of those 20 million e-mail providers would ever make a mistake and affect their 50 customers?
I don't think an average of 1,000 mistakes per person-affected-by-mistakes is realistic here. I'd bet it's closer to 1.
Yup, I've had the same thing. Just kinda amusing and ironic since I didn't happen to care about that event but it makes one nervous about relying on spam filtering.

On the other hand, once you train it a bit, it is mostly remarkable good. For me, switching from fastmail.fm (which was pretty good itself) to Gmail gave me a big improvement in spam control.

Super curious about this response saying on gmail you have more control, because from where I am the “mark as spam” button does nothing but move things to the spam folder. In theory it should learn from that but when someone used my email address to sign up for AT&T no amount of marking things as spam will stop their emails landing in my inbox.
As in signed up for AT&T service? If so it's because it's not spam - it's misdirected mail, but there are tens of thousands of other Gmail users who think that messages almost identical to those are things they absolutely want to receive.
If there is no business relationship between AT&T and that user, the U.S. Can Spam Act defines it as spam.

Shame on AT&T for not validating their customer's email address.

I too get the same type of spam from AT&T.

My point is that absent information that Google simply does not have no matter how creepy they get, there's literally no way they can identify such messages as spam - exactly the opposite in fact because probably 99.999% of such messages that they process are explicitly not spam.

The only way Google would have to identify that this message was not for you would be to get the subscriber information from AT&T and cross-reference it with name and address information they had for you - and even then most of the time they'd probably be wrong (e.g. if the email is coming to you but the account is actually in a family member's name).

I just cleaned out a little over 100 emails in my Gmail spam filter yesterday. About 80% of what was in there were emails from YouTube giving me notifications of new videos people have uploaded that I am subscribed to. These emails never used to go to spam, but slowly over time more and more of them would end up in spam. It's at the point now where almost all of them from YouTube go to the spam folder.

It doesn't make any sense since they are emails from Google, they are emails I even have a filter applied to so that a label is applied to them. Yes I can adjust the filter and choose "never send to spam" but the messages will still show a warning on them saying "This message was not sent to spam because of a filter you have applied".

Sure false positives makes sense, but I don't get how the majority of what is in my spam folder would be emails sent by Google.

> It's at the point now where almost all of them from YouTube go to the spam folder. It doesn't make any sense since they are emails from Google

It makes a lot of sense... people use the spam button as a lazy man's unsubscribe. Youtube adding the bell button, making mail opt in is probably a response to that.

I actually give Google a lot of cred for not simply white-listing its own domains. Though spammers would probably find ways to abuse it and make them look bad anyway.

> people use the spam button as a lazy man's unsubscribe

This is the small mail server crux right here. If you’re a small mail server and a few of your emails have been spam binned instead of unsubscribed, it would likely lead to your whole server getting shit canned.

You know what's really funny? That even with that overly agressive spam filter, once in a while (once a quarter maybe), it somehow manages to miss obvious BuY@@NiGERiAn@@Vi@gRa-Cia1is type of emails... which, by the way, my morally outdated spamassasin marks as spam.
"morally outdated" ?
I would make an educated guess that "arpa" is from EX-USSR and their native language is Russian. It's direct translation of idiom "морально устаревший" which literally means something is:

* Available for decades.

* Far from being top notch technology.

* Sometimes of course it's literally mean outdated. Like if you run older CentOS or Debian with decade-old packages.

So it's doesn't mean SpamAssassin is bad, but it's very far from state-of-the-art ML technologies that Google might have.

Bulgarian has that too: "Морално остарял"
Your comment motivated me to check my Gmail spam folder and 10% (3 out of 30) were false positives, two of which were pretty important. :(
I've recently started checking quite regularly my spam folder as I've noticed more and more legit emails ending up there. One of them being support emails from TradeMe (one of New Zealand's biggest sites), keep getting put into spam, even after multiple "mark as not spam", along with some other kinda important emails from TradeMe. I've had to put in manual filters to force an email from TM to skip the spam.
And yet, in my own none Gmail hosted email (fastmail), I currently have 2/55 false positive spam emails. I very rarely ever check it. To note I usually get one actual spam (non newsletter blog spam) to my actual inbox, a month.
TradeMe, like eBay and Banks is a popular target of phishing emails.
I recently missed an email from the UK government regarding a passport application, it went to Spam. Talk about missing the mark!
A spam filter can't whitelist government email. My personal SpamAssassin filters out spam from government servers all the time. The latest was from somewhere in Quebec.

You'd think the various governments would put more effort into computer security. They appear not to care, though.

Me too. 2 "spam" emails in Gmail's trash folder, none of which were spam (one was a newsletter from Mastodon (software)).

Edit: Gah. Now I checked my work Gmail spam folder. There was an email from one of my users there. (and nothing else)

Now I'm considering migrating away from Gmail, at least for work related things.

Me too. I found an invitation for an on-site job interview I would have 100% missed if they had not called me as well.
Likewise I found they filtered an e-mail from my bank about my car loan.
Ditto, Multiple missed emails from factory owner in fuzhou I'm on site doing business with and actively emailing back and forth with daily. Despite the back and forth communication some of the direct messages were in the spam folder. This could have caused me some major issues.
And all parents with kids in SF public schools should note that all emails from noreply@sfusd.edu (bulletins, etc.) are marked as spam.
> - Email coming from Google itself.

If email coming from google itself got special treatment there would be masses with pitchforks complaining about that.

You've made a lot of good points, but I don't think that's one of them.

I don't think that has anything to do with Google needing special treatment. It shows that the company who made the rules and has every privilege to follow them isn't able to.
Valid point. They've become so large and synonymous with email that many people I encounter are actually unaware that there is other email besides "Gmail."

It feels like Google no longer has any incentive to follow the rules, and they feel that they are going to be the ones to make the new rules. The rest of us end up having to implement workarounds.

I think that's a fair interpretation and a good point.
> You've made a lot of good points, but I don't think that's one of them.

I disagree. I think in the absence of that point, it would have been hard to say this:

> It doesn't look intentional (look at that fourth category!)

But also, I think special treatment for trusted actors is a completely appropriate way to handle email delivery, and I also think it's appropriate for gmail to trust themselves to be sending legitimate mail. Blocking their own email makes them look totally incompetent. They absolutely should whitelist themselves. And they should have a way for you to be whitelisted too, if you want to send email.

If they whitelisted the major players in the space, I'd worry that the problem would get _much_ worse for everyone else.
I've also had email from Google recruiters (@google.com email addresses) go to spam. I considered this a high level indicator that Google spam filtering is incompetent, not malicious - if it was malicious, they would at least be able to get their corporate emails through.
- Email coming from addresses to which I had already sent email. (!)

That is understandable. It is hard to validate if an email is authentic. SMTP has no authentication built in. Gmail can't just blindly accept all emails from addresses that you have already sent an email to.

EDIT:

Look at this example: https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol#...

Anyone can connect to relay.example.com and pretend to be bob@example.com.

Not if you use SPF/DKIM/DMARC they can't, that's the whole point of those various additions.

All those "I hacked your email and send you a message from you account" I don't get, because I have a DMARC policy that says if you don't pass SPF/DKIM then you get rejected. So try as the spammer might to connect to my mailserver and pretend to be me, they can't, because my mailserver sees they're not authenticated, and the mailserver they're sending from isn't in my SPF records, isn't signing the message with my DKIM key and therefore it gets rejected at the SMTP level.

That’s what SPF[1] is for, and optionally DMARC[2].

[1] https://en.wikipedia.org/wiki/Sender_Policy_Framework

[2] https://en.wikipedia.org/wiki/DMARC

Huh? If it's in the Spam folder, they do accept it.
I meant accept as not spam.
I've also had email from a Google recruiter go to the spam box...
I think the SPAM folder is a separate issue (although important)

The article discusses mail not being accepted by google/gmail in the first place.

Fair comment. But that issue is pretty simple: there is no good reason for them to fail to deliver email under any circumstances. (This has happened to me too -- someone tried to email my gmail account and gmail completely refused to deliver it. It was pretty embarrassing.)

Messages they think you won't want to receive are what the spam folder is for.

"fail to deliver email under any circumstances."

The amount of spam that would be delivered if they didn't discriminate AT ALL is enormous.

They have to read all this feedback and discriminate better.

> there is no good reason for them to fail to deliver email under any circumstances

Yes there is. They don't want to carry traffic from anybody from the major email blacklists. If a mail server is on a real, very transparently-managed blacklist, no large provider should be accepting their smtp traffic.

Yet all the people here trying to administrate servers from residential and VPS blocks of IPs are telling you they're caught up in this list you're lauding as all-knowing and safe.......
About accountability, this gmail thing vaguely reminds me about "The Drop" situation on the mobile phone network https://www.youtube.com/watch?v=pCOCKS5AJI8
> - Email coming from Google itself.

To be fair, sometimes Google does send spam.

>Legitimate class action notices related to Amazon purchases.

You'd think Google had an incentive to get that delivered.

I had a communication from a Google recruiter go to my Gmail spam folder.

But it still beats no filtering.

>> I would be vastly better off if nothing ever got filtered at all.

There are layers of filtering beyond what appears in your spam folder, layers that block obvious spam long before it gets anywhere near your account. If every email ever sent to your address wound up in your spam folder you'd beg for filtering.

Before switching to shared hosting from my VPS (one reason being that i didn't want to bother with email maintenance and didn't want to have a separate service just for email), i had my mail with (badly configured) Spamassassin that wouldn't delete mail, just add a "SPAM" prefix in the topic. My mail isn't exactly commonly known, but still is one i have for years and was made public thanks to me releasing an Android app once (after which, spam increased dramatically).

Even after running it for years, Spamassassin never marked a legitimate mail as spam, so i'm pretty sure that if i wasn't too lazy to configure it to move it to a spam folder, it'd work fine. Stuff did pass through it (at a ratio of one every four or so) but i was fine with deleting those.

What i'm trying to say is that from personal experience, i'd be fine with a spam filter that errs on the side of not marking stuff for spam and me deleting whatever goes through manually. Having to see a bit of spam mail is small cost for losing mail i'm actually interested in.

I'm using my e-mail pretty much everywhere (me@vbezhenar.com you have it in clear text, every bot will crawl it now, not for the first time, though). I'm too lazy to setup spamassasin yet, so I'm getting a notification for every spam message I got. The only thing that I'm trying to do is to click "unsubscribe" even from obviously spam mails (may be it'll make more harm than good, not sure). So I'm getting around 20-30 spam e-mails per day. I don't think that it's THAT bad. I'm spending may be a minute every day to delete it. And I'm sure that with absolutely minimal spam filtering I would achieve almost perfect filtering.
I'm prepared to believe this. But it's not a defense of gmail's policies -- this suggests that in fact nothing would be lost if gmail eliminated the spam folder and delivered everything that would have gone there to your inbox instead. So why are they doing this?