Hacker News new | ask | show | jobs
by dvt 1568 days ago
As someone that has spent a sizable amount of my career in ad products, the outrage here is kind of (sadly) funny. A conversion pixel? Hah, if you only had an idea of what the Facebook data faucet looked like in 2007-2017, your hairs would stand.

Pretty sure they were breaking all kinds of PII laws.

10 comments

> Hah, if you only had an idea of what the Facebook data faucet looked like in 2007-2017, your hairs would stand.

I really don’t understand the goal with vague statements like this that can’t even provide even the slightest hint of specifics.

What specific data? Even a single example would make this anecdote useful. Instead it feels more like a brag. “I know something but I’m not telling” but in this case the commenter doesn’t claim to have worked at Facebook (just the industry in general) so I suspect it’s hearsay anyway.

> Pretty sure they were breaking all kinds of PII laws.

Given the way Facebook has been under the microscope and dragged in front of Congress, I’m going to assume that their corporate counsel was very careful to provide at least a best-effort attempt to comply with every law available at the time. It may not be popular, but I really doubt Facebook was violating laws for a decade straight as the largest player in the space.

The interpretations in 2018 from the Six4Three emails released through efforts by a British MP were the most damning on this point.

Key points: https://twitter.com/YBenkler/status/1070337233159372806?s=20...

Here was one thread of highlights which is still somewhat readable: https://web.archive.org/web/20181206132832/https://twitter.c...

Germany banned their cross-site data sharing/reciprocity, such as from menstrual cycle-tracking apps which had come to light (maybe separately), in 2019: https://twitter.com/YBenkler/status/1093495901342126080?s=20...

I think you’re giving Meta a break with best intention operation. I really think web tracking is much more nefarious than you are giving credit to.
I really don’t understand the goal with vague statements like this that can’t even provide even the slightest hint of specifics. What specific data? Even a single example would make this anecdote useful.
A lot of people talk in broad strokes because if you use their marketing platforms you can see exactly what data is being mined. You can posit their ideological, political, and personal stances. Their friends, family, and pay more to reach people that are shown to influence them. You can choose their region, their income, their habits, hobbies, and kinks.

You can quickly create an account and look at their self-serve ads. There's no reason why anyone needs to try and "guess" what these tracking tools can do. You can just go to the endpoint of that collected data and just see what you can target.

It's better now but before you could do EVEN more. But better in the sense that someone who needs a limb amputating to stop gangrene setting in is better.

Additionally, there are different means to the same end, so being vague is keeping the discussion focused by keeping it about the general practice instead offering details that could easily derail into unproductive commentary. The ad firms probably move things around all the time, but the gist of it is, if you browser requests a resource from a server with a little metadata, god knows what’s being done with that from there.

The ubiquity of user tracking is extremely useful yet culturally absurd. Now that’s ambiguous ;)

No one said anything about "best intentions" or "non-nefarious". They said "legal".
Nefarious is defined as wicked or criminal. My usage was specific to the latter definition and not the former. I said nefarious and chose the definition the implies “illegal” but instead of having a conversation on the topic you chose to pick at specific word definitions.
Lawful evil.
Why do you give them the benefit of the doubt? There are countless examples of this kind of behavior in top companies.

The difference when it comes to other industries (e.g. food) is that the regulation has had time to develop, and most legislators understand the concepts. So it's harder to cheat.

Forgetting someones opt-out preferences by mistake doesn't ring as severe as using light carcinogens in your food mix.

It’s pretty straightforward and no secret. All those “share on Facebook” widgets you used to see everywhere are also tracking users. Since they’re embedded into basically every site ever, and each hit to the widget goes to facebook.com (so your browser helpfully sends their cookie along with it), that means Facebook knows who you are and what sites you visit without your consent or intervention, and uses that to sell targeted ads. They even have a profile on you even if you don’t use Facebook.

It’s changed a bit recently with GDPR, the Cambridge Analytica scandal and some third party cookie privacy stuff so it’s a bit less insidious now, but it’s still pretty bad.

Reminds me last week I was looking at parts on a chip makers website and wondering why page updates were taking so long. It's because at work facebook is blocked.

Frankly I do not know why corporations don't block facebook as a security risk. Seriously that stuff is bleeding info on what your employees are up to.

The amount of client-side JavaScript code that inconspicuous Like button loads is unnerving.
Doesn't matter much. Now it'll just happen server side where the server sends the same types of data directly to facebook. See the facebook CAPI. Basically a server side implementation of DataLayer and such...
I'm not particularly surprised. I always knew this day would come. In fact, I've been wondering why they haven't done this for a while. Maybe it's time to invest in a good VPN / Tor.
> if you only had an idea of what the Facebook data faucet looked like in 2007-2017, your hairs would stand.

I'm pretty sure everyone of technical aptitude knew Facebook's data faucet. But maybe I missed something.

As far as I know, Facebook:

a) Had all the freely provided data, PII/likes/social graph/etc.

b) On Facebook's site or mobile app, the were fingerprinting your device, examining your scrolling/mouse/clicking/other inputs to determine attention on a page

c) Could recreate most nonuser's social graphs just by seeing them as endpoints in registered people's contacts c) On the web, had "like" buttons or ads and their code pretty much everywhere. Therefore they could track most people to most sites.

d.1) Sites could directly provide more information to allow retargeting d.2) Sites could directly provide more information through a host of other services FB offered to the developers

e) On mobile, in the background, scrapped your contacts, GPS, nearby devices (other phones, WiFi, Bluetooth, although tower information may or not be included). Also, had installed by default on a lot of phones

f) On mobile, provided libraries people could import into their apps, mostly but not exclusively for ads. This let them get similar insights into usage patterns as on the web. Also, if people didn't install the FB app, let them get (e)

g) Used your real identity to purchase information about you from the various realworld data merchants

Which did I miss?

This is an excellent summary.

I don't think you missed anything substantial... but I'll add two extensions:

(1) Social graphs change slowly. And they still own Instagram, so for many users they have a live social graph still.

(2) Facebook Pixel is dead. Long live Facebook Pixel!

The Facebook Pixel is now (or at least nearing) effectively dead on modern up-to-date devices running ad blockers. Of course that leaves plenty of desktop machines where people aren't running ad blockers.

But more importantly, Facebook has acknowledged the elephant in the room and moved from client side to server side with Conversions API (CAPI) (aka the new "Facebook Pixel"). And there's nothing ad blockers can do about server-side analytics...

I left out Facebook Login. That may be a help as well, although I don't think they get much from that they don't get from the other integrations in mobile apps/websites already.
> As someone that has spent a sizable amount of my career in ad products, the outrage here is kind of (sadly) funny

Imagine gloating and being proud of such a career.

I didn’t get the sense that he/she was gloating. Just citing their expertise.
Whenever someone claims expertise in a way that is so vague and unverifiable that basically anyone could have made the claim, really, it is not a citation and not a sign of expertise. But it is very sketchy.

In a completely different matter, I worked on high level space programs in the late 1970s and if you had any idea about the information that the government is hiding on extraterrestrial life TO THIS DAY, your hair would stand.

I used to tell people if they know how ad tech worked it would be banned tomorrow.

I doubt it's on FB during that period though?

I would guess though that a bunch of health tech sent (perhaps accidentally just not understanding) a bunch of patient data though. Seems they are the responsible party.

There's been other examples beyond FB of 'auto track' too. devs just don't know or forget to turn it off.

Not to mention for some reason at that time putting a FB like button on all the porn. Who clicks that?!?!

> I would guess though that a bunch of health tech sent

I worked for a "healthtech" company in London at the beginning of the pandemic. They had the Facebook SDK malware embedded in the app that people were supposed to use for GP consultations.

I don't believe any explicit health data was sent (there was no intent to do so, and I’m not sure if that would even be possible), but merely the fact that I'm talking to a doctor (and the current time, location, device fingerprint, etc) is not something I'd like Facebook to know.

I know breaching the GDPR is basically the norm in any tech company but I thought that being involved in healthcare would make them super risk-averse and make an extra effort to comply.

They were not alone in this - PatientAccess and a bunch of other sites - that you can use to book GP consultations (including through the NHS - UK’s socialised healthcare system) had a shit ton of such trackers too, obviously loaded before any GDPR consent could even be obtained.

I don't know as much about the app SDK, but from the pixel it used to auto detect things like form fill ins, clicks, url params, urls etc. So there is potential it incidentally collected something bad!
"I don't know why you are upset that I'm stabbing you when I've been poisoning your all these years ha ha ha".
Not really accurate analogy. More like “you’re only finding out today that I’ve been poisoning you the entire time?”
pedantic
In other words, victim blaming.
No. Lol.
Well I am ready for my hairs to stand up. What did the data faucet contain?
Not OP, but somewhere around 2010 I tinkered with creating a game for Facebook. I signed up for a developer account, spent some time with the docs and built a toy app.

It was a straightforward call to get info about the user, including name, email, interests, etc. Their friends list. Info about all their friends, including all the same details. And so on.

There was a EULA where the developer had to promise to delete all the info when the user signed out of the game, and not to share the info. That was the only security.

The project fizzled, but when the Cambridge Analytica news broke, it confused me, because my recollection is EVERYBODY had all those lists of user info. Seriously, tens of thousands of different companies, with the only thing stopping them was a pinky promise.

This is all true, but to get that data you had to run an app on Facebook. To my recollection, you could not grab this level of data with a Facebook JavaScript tag on 3rd party websites. Facebook offered this data on-platform to convince organizations to drive their audiences to Facebook.com instead of their own site.
We used to have access to individual demographic data for breaking down your analytics and ad targeting, as well as being able to target users based on their specific email address or phone number. We could also target your friends.

From memory this has now all been rolled up into cohort demographics and 'look-a-like' audiences so you can no longer break your data down by specific users demographic attributes or target ads by say an email list unless they are already your users (and it's used for specific types of ads; retargeting).

From memory some of the more unsettling breakdown/targets were

  - Ethnicity
  - Life events
  - Politics
  - Pages (so other businesses) they had liked

I worked for a pure play furniture retailer you used to be able to do things like buy email lists from price comparators of gas/eletricity/home insurance with additional data like your postcode and then upload it to facebook and specifically target everyone with a FB account under that email/phone number with furniture ads. As the assumption is that if your looking for gas/eletricity/home insurance your likely to be moving home or at least be a person who had some need for furniture.

Now they just use the facebook pixel to put you into look a like audiences because they can see you went to the home insurance website and the real estate website (they all have the pixel installed) and make the same broad assumptions we were previously making but on the cohort and not your specific unique identifier.

That's... not that interesting? At least not on the Facebook side of things.

Why shouldn't I be able to advertise to people who have recently marked themselves as married, or who liked the Yankees facebook page?

You should be able to advertise to people that are Yankee fans on facebook, but you shouldn't be able to get John Doe's email address from the Yankees fan club directory (not on facebook) and directly target them with ads if they have zero relationship to your business.

You also shouldn't be able to upload a list of email addresses, target your ads to them, and then use Facebook's analytics to see how many of those people have divorces or an investment property via the segmentation analysis. Depending on how small that list is, a lot of that data starts getting very specific to an individual.

Facebook obviously also thought you shouldn't be able to do this since now you can't. Everything is now cohort and look-alikes.

Additionally your acting like Facebook is only putting you in the ‘recently married’ bucket if you marked yourself as married. Facebook is smarter than that, they are putting you in these buckets based on your messages, instagram activity, and browsing behaviour, not necessarily based on public information you expose in your profile.

>- Ethnicity - Life events - Politics - Pages (so other businesses) they had liked

The above, is all frowned up (for the record, I 100% agree) what is interesting is, our civilization has evolved to frown up the above "categorizing" of people, but from that list (only ethnicity) is not "choose able" by the person.

I.e I can mostly choose when I get married and to whom and which party I support but I def cannot choose my race. Yet the above is all considered 'equally bad'.

Just something I noticed.

Any developer can still send almost any data they want into FB to track basically anything.

'offline conversion' still allows you to send in names, age, bday, gender, etc for matching, IP, UA.

Though now it's hashed before going to FB.

And you can pass almost whatever custom data you want in. So I can in my industry optimize for a long term political donor, or potentially an early vote. Or someone accidentally sends in 'this person bought hepatitis meds'

This still exists despite what someone below said, unless I'm totally missing or misunderstanding something.

However it is not as valuable, and shrinking audience able to do deliver to because of iOS restrictions. And likely with Chrome too eventually

FB used to have more detailed interest/demographic buckets to target that they supplied. Used to be able to type in basically anything from what someone likes ___ very niche page to engages liberal political. There are still interests but there are fewer of those 'sensitive' ones. Still lots of stuff like works at ___. of course age, gender, geo.

But more fine grained interest targeting seems like going away pretty soon it's just going to be broad demographics.

The ROI is just not as good without iOS fine grained targeting FB is having to do a bunch of tricks with AI/modeling to try and make it perform but it's not as good.

I'm for targeted advertising. I think what iOS is doing is uncompetitive and bad.

But I do think there should be some sensible regulation. Like no healthcare or sensitive topic data (LGBTQ, dating, etc).

** ADDITION sorry this is long but one additional thing I think people also confusing the ad product with the old FB api.

The old FB api was an absolute sieve you could get basically any data a person has on their profile and also their friend's data. This is what happened with Cambridge Analytics.

All that has been shut down even login with fb they are way more strict about actually testing sites etc

> Though now it's hashed before going to FB.

It is however an easily-reversible hash, by design as that's how FB can correlate between the different datasets. When it comes to finite sets such as phone numbers or dates of birth it's also trivial to search the entire space by bruteforce.

IIRC it's sha256. Is that really reversible now?

For sure on a rainbow table for something like cell phone. but i don't know why that would matter? Anyone can generate all the possible phone numbers.

The whole point is that it is matchable. Like if they already have my email then they know if it's a match, but if they don't have my email they don't know what the missing email is.

Like what's my email from this (without knowing my email) below: 2c03e4a168bed89f5208250cdefbe97d4d87ba7812df896311676acc2ddfcdb4

Depends, for DOB and phone numbers the search space is finite and very small for a modern computer (especially so for a big tech adversary having access to near-infinite computing power) so you can just enumerate all the possibilities.

Names and emails can be bruteforced with various lists from existing data breaches or data brokers and you'll probably reverse 80% of them.

However reversing them is not even necessary - an adversary like Facebook can infer it based on other data, for example, let's say they know your phone number but not your email - now you buy/sign up to vendors providing both that phone number and email and they provide it to Facebook - now Facebook knows that you signed up to those vendors with your number (as they have the plain text value, can hash it on their side and compare), but they also see that there's a mysterious email hash - they don't know its plaintext value, but it perfectly matches the same vendors that have your phone number. They can infer that it's probably your email address, and while they still don't know what it is, they can use the hashed value to track you across other vendors without ever having to know the plaintext value.

Right. That's kind of the whole point of FBs value. Or at least used to be before iOS started killing that targeting and conversion tracking.
What happened in 2017?
I would guess that's when the Cambridge Analytics thing became well known, where they were using Facebook's network/data graphs to compile their own compiled and targeted data.
GDPR maybe
What does it say about TikTok's tracking pixels in UberEats?
Nothing. Because TikTok didn't put the tracking pixel there, UberEats did. It's from an advertising campaign that UberEats is running on TikTok. The need to related "conversions" (ie: people ordering/buying shit) on their system with whichever ad they were given on the TikTok side.
The solution to this is simple though by no means easy: treat ones digital data as private property.
I don’t see why that would solve the issue. your browser is communicating with facebook’s servers, so if they log your communication it’s not exactly a violation of your private property rights
In principle, sending your data in a way that's decipherable to the backend isn't in principle required. WhatsApp encrypts (or at least used to, not sure about now) messages.

And no doubt it's not compatible with their current business model. Which is the point, it's a model that exploits property that isn't theirs for unfair gain.

What PII laws are there in the US?
https://oag.ca.gov/privacy/privacy-laws # California State summary
But when was that enacted and put into force?