Hacker News new | ask | show | jobs
by saagarjha 2234 days ago
This is one of the reasons why I'm generally not OK with "anonymized" data collection without an explanation of how it's being anonymized. It's almost always easy, often trivially easy, to correlate the data together and basically get a perfect recreation of whatever the original data was back.
2 comments

Anonymization in the data reselling industry is often some form of md5(lower($email)). It's a joke. They even do that for extremely small search spaces like phone numbers. It's still provided at the individual user-level and even if the anonymization is done in a way that's irreversible, you only need to know a single event for a given person and you now have their entire history.

For example, there's a popular email client that scrapes people's inboxes and sells their purchase history to anyone willing to pay. That purchase history is provided on an individual email level and is "anonymized". But if you know your target has this email client installed and you know a single purchase (e.g. a coworker saying "Oh, I bought this awesome coffee maker on Amazon last night!") you can now access their entire individual purchase history backward and forward.

> coworker saying "Oh, I bought this awesome coffee maker on Amazon last night!"

This x1000.

I have seen people invite others to eat lunch at restaurants that only accepted credit cards in order to elicit such a data sample.

Yeah, it's not just emails of course. You can do it with web traffic data. You can do it with credit card data. You can do it with geolocation data. You can do it with TV viewing data.
Wait, what? Please tell us more
Wait... WHAT?!?! I mean if I think about it, yeah that makes sense to have been built but WTF?!?

Care to share which email client it is? It should be killed with fire!!!

Assuming you're asking a genuine question, it's Gmail.

https://mail.google.com/

Do you have a source for claims that (a) google parses emails for purchase histories and (b) sells it?

https://myaccount.google.com/purchases is empty for me, and I sure do have a lot of email receipts on my gmail. It also says "Purchases made using Search, Maps, and the Assistant are organized to help you get things done, like tracking a package or reordering food".

https://www.cnbc.com/2019/05/17/google-gmail-tracks-purchase... "Google says it doesn’t use this information to sell you ads."

Google used emails for ad targeting which was mentioned in Microsoft "Scroogled" ad campaign in the US. But Google says it stopped doing so years ago.

Do I miss something?

No, no, Google doesn't read your emails (any more); they just give access to them to certain "developers" (read: arbitrary paying third-parties).
I believe that GP is talking about the unroll.me service which was caught selling purchase receipts to companies like Uber a few months ago.
I was referring to https://mail.edison.tech/, but they compete with unroll.me (rakuten intelligence) in selling people's emails.
That's pseudonomization, not anonymization. The former is generally not GDPR compliant.
"Anonymized" has become a marketing buzzword.