|
In 2009-2013 Facebook was under a lot of pressure to share their data with academic researchers. They didn’t have the resources to process information, anonymise and make those tables available and manage the relationship, the research project, etc. More importantly, they were afraid of data leaks from careless handling. One researcher with an interesting research topic on personality traits and impeccable credentials (Cambridge, great PhD advisor) started collecting information on his own with a personality test, an app the newly updated API. It seemed like an easy to manage relation, someone who knew about information security and a good test-balloon for engaging more with the academic community. There were some warnings later when the app grew suddenly: he started paying Amazon Mechanical Turk to increase the subscribers; however, paying subject was common in psychology and AMT was becoming a standard in that area. More power to him. There were some documents signed to confirm he was focusing on academic-only interest. So far, that was hardly a scandal. But Kogan wanted to live from his research and he would end up selling the database to a company called SCL, doing psychological research. That wasn’t completely leftfield. Facebook anticipated and decided to pull the plug, giving the app weeks to close (in order to let end-customers download their data, a standard process at the time when data portability was the main topic of conversation). Kogan claimed he used a different app but kept the same credentials and user base, so that argument was, as expected, invisible to the API detection systems. The AMT budget grew massively to a million dollars. I don’t know how or when Facebook learned about the sale; I suspect the increase of activity triggered an audit. Once the lie was clear, Facebook asked the buyer to delete the data, obtained outside of the agreement in the API license. Without being the ICO, you can’t use the law to guarantee that, so they did the next best thing: they asked the buyer to sign legal documents promising that they had deleted it. If Facebook could have used the ICO then to enforce the deletion, they would have. I’ve had to deal with them since and I would find that confidence… generous. But either way: physical force wasn’t an option between two companies, neither had breached the letter of the law. All academic collaboration projects were put on ice and later cancelled because that test proved sour. Third-party apps misbehaving is alas common and there’s nothing special about this: people were scamming gullible users into clicking on fake buttons at an industrial scale. So false claims where banned. More dodgy stuff emerged, they were blocked; scammers tried again with renewed trick and were caught with an increasingly refined detection system. And, as far as Facebook was concerned, it was a massive cat-and-mouse game but that was it. CA was one of many game studios, spammers, etc. who had abused the API in ways that were unexpected in 2008 and patched since. One thing that has not been documented about this: banning and lawyer letters were possibly not enough; there is a system to detect suspicious targeting (like racist real-estate ads). Most likely, Kogan’s list, like other data stolen by spammers was added to that system. It would detect anyone using stolen information -- so if CA used it, they would get caught again. That particular feature is not talked about a lot for obvious reasons but fairly expected when you have so many scammers at scale. For Facebook, that story is a classic case of enforcing API rules: extensive observability, defence in depth, complying with the law at the time (who couldn’t care less about disclosure) and being flexible with rules that abusers are keen to circumvent. So, saying that Cambridge Analytica used that leaked data to target advertising was surprising to anyone in charge, and their expected reaction was likely to check, see that wasn’t the case and likely focus on something else, something that wasn’t a (for an insider obviously) made-up scandal, spewed by a disgruntled employee. For a good reason: Cambridge Analytica didn’t use Facebook data or anything like that. They tried briefly, it failed and they fired the data scientist who wouldn’t let it go but that’s it: overall, the dataset wasn’t helpful. I know that because I had left Facebook soon before and I was surprised when I heard that story too. I had the opportunity to meet CA data scientists, at conferences, etc. both before and after the scandal, and ask them directly: they confirmed and told a far more credible story. All of Kogan’s work, the Ocean model, the Facebook dataset, three- or four-year-old at this point? All bullshit. Well, the research was interesting and there are some correlations but it’s not usable at scale, not with three-year-old partial data. They didn’t delete the database (putting them in a breach with Facebook, again, and in a difficult position once the ICO started enforcing new laws) anyway but never used it—which is why you have ambiguity between former employees on whether that table existed at all. As you can see at the exact 1:00:00 mark in _The Big Hack_ documentary, they used party membership, magazine subscription and credit report data (for the car make & model) to predict affiliation -- not a bad take, actually: all three have verified full name and address. More importantly, all are more representative of your opinion (in the US) than whether you liked the “Curly Fries” page. Quick take: if Joe is a 65-yo registered Republican, read _Guns & Ammos_ and drive a Ford 350, who might you vote for? Should you send Joe a paper flyer asking for a campaign contribution? No need to figure out what Facebook account might be Joe’s and whether he is a fan of the page “I like to flip my pillow at night to get the cooler side”. Why would SCL/CA lie about their work? Because their actual job was more to write slimy, racist and sexists dog-whistle campaign ads (see the “Do So” campaign in Trinidad & Tobago, in the same _Big Hack_ documentary, pitting Blacks against Indians). It’s not really great to sell yourself based on that so they mentioned magical statistics and impressive “results”. De facto, their effectiveness doesn’t come from targeting (the Do So racist campaign, as the Trump racist campaign, was for all to see) but from drafting counter-campaigns with fake militants; leveraging “secret” meaning about “the cultural differences with immigrants” ::cough, cough:: skin colour, “real Americans” i.e. not _that_ “Mexican” judge; getting different reaction from different demographic groups. Overall, Classic 70’s StateOps, that the CIA and FSB have perfected against each other. I remember an Asterix where this is explained: Asterix and the Roman Agent. Tactics that Trump has exploited shamelessly: provoke your most principled opponent with abusive, often coded language, and exploit their outrage as being anti-pragmatic. Use that attention to deny principles. Deny the truth, repeatedly, shamelessly. Of all the things that the Trump twitter account is, targetted isn’t one of them. |
Where things get suspicious for outsiders but fairly straightforward: Did Facebook collaborate further? No, but if you look for trouble, it’s easy to make it look bad:
Presidential campaigns are fairly large individual clients for Facebook; they have to ramp up fast; have time-sensitive campaigns to run so, to avoid any blockers and being accused of being unfair and undemocratic, every campaign has points of contact to check that all is smooth. Essentially, salespeople familiar with Facebook more advanced ad targeting options and who can be there to help with idiosyncrasies like having some text on ad images (a big, hard-to-comprehend No-No for a while, a very common mistake for inexperienced campaigns, who are naturally quick to scream censorship when they are blocked).
Several people assumed that because “Facebook” had banned SCL Elections three years prior, ”Facebook” should know better than to let CA, a company with a different name, help the Trump campaign:
- Facebook is a large company and the anti-abuse team for the API has little to do with the Sales team; they are not even based on the same office or the same coast of the US;
- Facebook is growing incredibly fast and no one helping sell was there three years prior -- campaign assistant are very junior;
- the client wasn’t the same: the GOP and the campaign were the official holders of that account; banning them for having consultants from a company with dodgy behaviour wasn’t going to make sense.
But more importantly: none of what the GOP did was against any of Facebook’s rules. They bought all the data they needed from Experian, from press delivery companies, something that was a common request from advertisers then (who wanted to target based on credit rating, newspaper subscriptions). They tested a lot of their ideas and the audiences emerged organically from how people reacted to those. Read any story on how the two campaigns had such a different approach to campaigning online. No secret data was needed to make a difference: one tried, the other coasted.
What did Facebook do after they realised you could run a scandalous campaign using Experian and other third-party data? Block those (For many reasons, being slimy and not allowing people to edit false information was one; having shitty data in the first place was another big one). That was actually the best thing to do, but no one reacted to that actual, effective, meaningful change.
So what was Facebook to do after they had consistently adapted and enforced their policy for ten years on an API that wasn’t perfect, but that had learned faster than anyone else? Explain why they did what they did?
They tried and no one listened, accusing the company of inconsistencies rather than understand that, say, Mark doesn’t personally handle banned apps on the API. After too many people demanding that Facebook use Police power to delete data they didn’t know was still there, I think most of the senior brass checked out: it was a made-up scandal and nothing they could say or do would really help.
That populists get elected and run countries to the ground is real and problematic, but that’s due to mechanisms that are unrelated to Facebook targeting algorithms, at least as far as we can understand them. There’s growing inequality, diverging perspectives between citizens; a few operative have developed real expertise in spewing bullshit, but none of that will be solved if people don’t separate causes and false claims. Telling people to leave Facebook won’t make coordinated abuse organised on Discord servers less painful. Banning political advertising on Facebook won’t prevent Trump from confiscating the news cycle the day before an election by lying on Twitter about immigration, or Johnson in the UK to have a photo op to Google-bomb his way out of scandals. You won’t improve the situation with a sacrificial lamb.
People are still gladly sharing their bank data with unscrupulous credit companies; the NYTimes is too happy to ignore the real scandal there. Whatever Mark said would be a scandal either on the left (if he bans ads, including left-wing one) of the right (if he doesn’t include inflammatory bullshit as “the press”), so… the company stopped talking about it because nothing could be better if they did talk.
Being in that Catch-22 space tells you to move on. Mark did, months ago, thinking about products, integration, platforms. Somehow, Libra came up as a good direction for the company. That’s what Mark has in mind. And I don’t think his brain would be able to make better decisions about the company with more bullshit like “CA is the biggest scandal” at the forefront.