Hacker News new | ask | show | jobs
by Gunax 1567 days ago
Well I am ready for my hairs to stand up. What did the data faucet contain?
3 comments

Not OP, but somewhere around 2010 I tinkered with creating a game for Facebook. I signed up for a developer account, spent some time with the docs and built a toy app.

It was a straightforward call to get info about the user, including name, email, interests, etc. Their friends list. Info about all their friends, including all the same details. And so on.

There was a EULA where the developer had to promise to delete all the info when the user signed out of the game, and not to share the info. That was the only security.

The project fizzled, but when the Cambridge Analytica news broke, it confused me, because my recollection is EVERYBODY had all those lists of user info. Seriously, tens of thousands of different companies, with the only thing stopping them was a pinky promise.

This is all true, but to get that data you had to run an app on Facebook. To my recollection, you could not grab this level of data with a Facebook JavaScript tag on 3rd party websites. Facebook offered this data on-platform to convince organizations to drive their audiences to Facebook.com instead of their own site.
We used to have access to individual demographic data for breaking down your analytics and ad targeting, as well as being able to target users based on their specific email address or phone number. We could also target your friends.

From memory this has now all been rolled up into cohort demographics and 'look-a-like' audiences so you can no longer break your data down by specific users demographic attributes or target ads by say an email list unless they are already your users (and it's used for specific types of ads; retargeting).

From memory some of the more unsettling breakdown/targets were

  - Ethnicity
  - Life events
  - Politics
  - Pages (so other businesses) they had liked

I worked for a pure play furniture retailer you used to be able to do things like buy email lists from price comparators of gas/eletricity/home insurance with additional data like your postcode and then upload it to facebook and specifically target everyone with a FB account under that email/phone number with furniture ads. As the assumption is that if your looking for gas/eletricity/home insurance your likely to be moving home or at least be a person who had some need for furniture.

Now they just use the facebook pixel to put you into look a like audiences because they can see you went to the home insurance website and the real estate website (they all have the pixel installed) and make the same broad assumptions we were previously making but on the cohort and not your specific unique identifier.

That's... not that interesting? At least not on the Facebook side of things.

Why shouldn't I be able to advertise to people who have recently marked themselves as married, or who liked the Yankees facebook page?

You should be able to advertise to people that are Yankee fans on facebook, but you shouldn't be able to get John Doe's email address from the Yankees fan club directory (not on facebook) and directly target them with ads if they have zero relationship to your business.

You also shouldn't be able to upload a list of email addresses, target your ads to them, and then use Facebook's analytics to see how many of those people have divorces or an investment property via the segmentation analysis. Depending on how small that list is, a lot of that data starts getting very specific to an individual.

Facebook obviously also thought you shouldn't be able to do this since now you can't. Everything is now cohort and look-alikes.

Additionally your acting like Facebook is only putting you in the ‘recently married’ bucket if you marked yourself as married. Facebook is smarter than that, they are putting you in these buckets based on your messages, instagram activity, and browsing behaviour, not necessarily based on public information you expose in your profile.

>- Ethnicity - Life events - Politics - Pages (so other businesses) they had liked

The above, is all frowned up (for the record, I 100% agree) what is interesting is, our civilization has evolved to frown up the above "categorizing" of people, but from that list (only ethnicity) is not "choose able" by the person.

I.e I can mostly choose when I get married and to whom and which party I support but I def cannot choose my race. Yet the above is all considered 'equally bad'.

Just something I noticed.

Any developer can still send almost any data they want into FB to track basically anything.

'offline conversion' still allows you to send in names, age, bday, gender, etc for matching, IP, UA.

Though now it's hashed before going to FB.

And you can pass almost whatever custom data you want in. So I can in my industry optimize for a long term political donor, or potentially an early vote. Or someone accidentally sends in 'this person bought hepatitis meds'

This still exists despite what someone below said, unless I'm totally missing or misunderstanding something.

However it is not as valuable, and shrinking audience able to do deliver to because of iOS restrictions. And likely with Chrome too eventually

FB used to have more detailed interest/demographic buckets to target that they supplied. Used to be able to type in basically anything from what someone likes ___ very niche page to engages liberal political. There are still interests but there are fewer of those 'sensitive' ones. Still lots of stuff like works at ___. of course age, gender, geo.

But more fine grained interest targeting seems like going away pretty soon it's just going to be broad demographics.

The ROI is just not as good without iOS fine grained targeting FB is having to do a bunch of tricks with AI/modeling to try and make it perform but it's not as good.

I'm for targeted advertising. I think what iOS is doing is uncompetitive and bad.

But I do think there should be some sensible regulation. Like no healthcare or sensitive topic data (LGBTQ, dating, etc).

** ADDITION sorry this is long but one additional thing I think people also confusing the ad product with the old FB api.

The old FB api was an absolute sieve you could get basically any data a person has on their profile and also their friend's data. This is what happened with Cambridge Analytics.

All that has been shut down even login with fb they are way more strict about actually testing sites etc

> Though now it's hashed before going to FB.

It is however an easily-reversible hash, by design as that's how FB can correlate between the different datasets. When it comes to finite sets such as phone numbers or dates of birth it's also trivial to search the entire space by bruteforce.

IIRC it's sha256. Is that really reversible now?

For sure on a rainbow table for something like cell phone. but i don't know why that would matter? Anyone can generate all the possible phone numbers.

The whole point is that it is matchable. Like if they already have my email then they know if it's a match, but if they don't have my email they don't know what the missing email is.

Like what's my email from this (without knowing my email) below: 2c03e4a168bed89f5208250cdefbe97d4d87ba7812df896311676acc2ddfcdb4

Depends, for DOB and phone numbers the search space is finite and very small for a modern computer (especially so for a big tech adversary having access to near-infinite computing power) so you can just enumerate all the possibilities.

Names and emails can be bruteforced with various lists from existing data breaches or data brokers and you'll probably reverse 80% of them.

However reversing them is not even necessary - an adversary like Facebook can infer it based on other data, for example, let's say they know your phone number but not your email - now you buy/sign up to vendors providing both that phone number and email and they provide it to Facebook - now Facebook knows that you signed up to those vendors with your number (as they have the plain text value, can hash it on their side and compare), but they also see that there's a mysterious email hash - they don't know its plaintext value, but it perfectly matches the same vendors that have your phone number. They can infer that it's probably your email address, and while they still don't know what it is, they can use the hashed value to track you across other vendors without ever having to know the plaintext value.

Right. That's kind of the whole point of FBs value. Or at least used to be before iOS started killing that targeting and conversion tracking.