Hacker News new | ask | show | jobs
by carloswilson 2422 days ago
I appreciate that answering questions in a grilling interview is more difficult than asking questions, but ...

> “You don’t know? This was the largest data scandal with respect to your company, that had catastrophic impacts on the 2016 election. You don’t know?”

... but at the same time, I would certainly expect the chairman and CEO of a company to be more prepared to answer these difficult questions than what is reported in the original article. This was one of the biggest scams related to Facebook. I understand that Zukerberg's legal team may have advised him to be vague during the interview and he probably answered what is best for his company.

But as citizens of the world, we should never find such vague answers acceptable. The way Cambridge Analytica has upset the process of democracy and how Facebook data was used while doing so requires Facebook to be subject to such grilling. It should be made clear to Facebook in no uncertain terms that it needs to present answers and concrete answers.

5 comments

Sure, we should not find this dodging acceptable, but shouldn't we start by using this standard on the politicians themselves first? At least Zuckerberg is offering a service we can opt out of, that's not the same for what the politicians do.
In a very real way, that's not true.

Facebook's adoption is so high that even if you and I choose not to use it—choose to take every measure available to us to avoid personal interaction with anything Facebook does—we cannot avoid its effects, because of the millions of people around us who do not take that choice.

Facebook can, without any exaggeration or doubt, influence elections in multiple countries. Treating Zuckerberg like he's the CEO of, say, Dropbox just doesn't fly anymore.

> This was one of the biggest scams related to Facebook.

It actually wasn’t but no one will listen to people familiar with what happened.

Please elaborate.
In 2009-2013 Facebook was under a lot of pressure to share their data with academic researchers. They didn’t have the resources to process information, anonymise and make those tables available and manage the relationship, the research project, etc. More importantly, they were afraid of data leaks from careless handling. One researcher with an interesting research topic on personality traits and impeccable credentials (Cambridge, great PhD advisor) started collecting information on his own with a personality test, an app the newly updated API. It seemed like an easy to manage relation, someone who knew about information security and a good test-balloon for engaging more with the academic community.

There were some warnings later when the app grew suddenly: he started paying Amazon Mechanical Turk to increase the subscribers; however, paying subject was common in psychology and AMT was becoming a standard in that area. More power to him. There were some documents signed to confirm he was focusing on academic-only interest. So far, that was hardly a scandal.

But Kogan wanted to live from his research and he would end up selling the database to a company called SCL, doing psychological research. That wasn’t completely leftfield. Facebook anticipated and decided to pull the plug, giving the app weeks to close (in order to let end-customers download their data, a standard process at the time when data portability was the main topic of conversation).

Kogan claimed he used a different app but kept the same credentials and user base, so that argument was, as expected, invisible to the API detection systems. The AMT budget grew massively to a million dollars. I don’t know how or when Facebook learned about the sale; I suspect the increase of activity triggered an audit.

Once the lie was clear, Facebook asked the buyer to delete the data, obtained outside of the agreement in the API license. Without being the ICO, you can’t use the law to guarantee that, so they did the next best thing: they asked the buyer to sign legal documents promising that they had deleted it. If Facebook could have used the ICO then to enforce the deletion, they would have. I’ve had to deal with them since and I would find that confidence… generous. But either way: physical force wasn’t an option between two companies, neither had breached the letter of the law.

All academic collaboration projects were put on ice and later cancelled because that test proved sour.

Third-party apps misbehaving is alas common and there’s nothing special about this: people were scamming gullible users into clicking on fake buttons at an industrial scale. So false claims where banned. More dodgy stuff emerged, they were blocked; scammers tried again with renewed trick and were caught with an increasingly refined detection system. And, as far as Facebook was concerned, it was a massive cat-and-mouse game but that was it. CA was one of many game studios, spammers, etc. who had abused the API in ways that were unexpected in 2008 and patched since.

One thing that has not been documented about this: banning and lawyer letters were possibly not enough; there is a system to detect suspicious targeting (like racist real-estate ads). Most likely, Kogan’s list, like other data stolen by spammers was added to that system. It would detect anyone using stolen information -- so if CA used it, they would get caught again. That particular feature is not talked about a lot for obvious reasons but fairly expected when you have so many scammers at scale.

For Facebook, that story is a classic case of enforcing API rules: extensive observability, defence in depth, complying with the law at the time (who couldn’t care less about disclosure) and being flexible with rules that abusers are keen to circumvent.

So, saying that Cambridge Analytica used that leaked data to target advertising was surprising to anyone in charge, and their expected reaction was likely to check, see that wasn’t the case and likely focus on something else, something that wasn’t a (for an insider obviously) made-up scandal, spewed by a disgruntled employee. For a good reason: Cambridge Analytica didn’t use Facebook data or anything like that. They tried briefly, it failed and they fired the data scientist who wouldn’t let it go but that’s it: overall, the dataset wasn’t helpful.

I know that because I had left Facebook soon before and I was surprised when I heard that story too. I had the opportunity to meet CA data scientists, at conferences, etc. both before and after the scandal, and ask them directly: they confirmed and told a far more credible story.

All of Kogan’s work, the Ocean model, the Facebook dataset, three- or four-year-old at this point? All bullshit. Well, the research was interesting and there are some correlations but it’s not usable at scale, not with three-year-old partial data. They didn’t delete the database (putting them in a breach with Facebook, again, and in a difficult position once the ICO started enforcing new laws) anyway but never used it—which is why you have ambiguity between former employees on whether that table existed at all.

As you can see at the exact 1:00:00 mark in _The Big Hack_ documentary, they used party membership, magazine subscription and credit report data (for the car make & model) to predict affiliation -- not a bad take, actually: all three have verified full name and address. More importantly, all are more representative of your opinion (in the US) than whether you liked the “Curly Fries” page. Quick take: if Joe is a 65-yo registered Republican, read _Guns & Ammos_ and drive a Ford 350, who might you vote for? Should you send Joe a paper flyer asking for a campaign contribution? No need to figure out what Facebook account might be Joe’s and whether he is a fan of the page “I like to flip my pillow at night to get the cooler side”.

Why would SCL/CA lie about their work? Because their actual job was more to write slimy, racist and sexists dog-whistle campaign ads (see the “Do So” campaign in Trinidad & Tobago, in the same _Big Hack_ documentary, pitting Blacks against Indians). It’s not really great to sell yourself based on that so they mentioned magical statistics and impressive “results”. De facto, their effectiveness doesn’t come from targeting (the Do So racist campaign, as the Trump racist campaign, was for all to see) but from drafting counter-campaigns with fake militants; leveraging “secret” meaning about “the cultural differences with immigrants” ::cough, cough:: skin colour, “real Americans” i.e. not _that_ “Mexican” judge; getting different reaction from different demographic groups.

Overall, Classic 70’s StateOps, that the CIA and FSB have perfected against each other. I remember an Asterix where this is explained: Asterix and the Roman Agent. Tactics that Trump has exploited shamelessly: provoke your most principled opponent with abusive, often coded language, and exploit their outrage as being anti-pragmatic. Use that attention to deny principles. Deny the truth, repeatedly, shamelessly. Of all the things that the Trump twitter account is, targetted isn’t one of them.

While the data scientists working for CA were reasonable people dealing with the usual bullshit from agency salespeople massively over-selling their work, the sales team (picture in that Channel 4 leaked video) are the worst -- lying about prostitutes on camera is hardly surprising. Because, yeah: that “we can entrap your opponent, we do it all the time” was obviously made-up bullshit to impress a fake gullible prospect. Not sure it makes it “better” but it’s a clearer illustration of who you are dealing with.

Where things get suspicious for outsiders but fairly straightforward: Did Facebook collaborate further? No, but if you look for trouble, it’s easy to make it look bad:

Presidential campaigns are fairly large individual clients for Facebook; they have to ramp up fast; have time-sensitive campaigns to run so, to avoid any blockers and being accused of being unfair and undemocratic, every campaign has points of contact to check that all is smooth. Essentially, salespeople familiar with Facebook more advanced ad targeting options and who can be there to help with idiosyncrasies like having some text on ad images (a big, hard-to-comprehend No-No for a while, a very common mistake for inexperienced campaigns, who are naturally quick to scream censorship when they are blocked).

Several people assumed that because “Facebook” had banned SCL Elections three years prior, ”Facebook” should know better than to let CA, a company with a different name, help the Trump campaign:

- Facebook is a large company and the anti-abuse team for the API has little to do with the Sales team; they are not even based on the same office or the same coast of the US;

- Facebook is growing incredibly fast and no one helping sell was there three years prior -- campaign assistant are very junior;

- the client wasn’t the same: the GOP and the campaign were the official holders of that account; banning them for having consultants from a company with dodgy behaviour wasn’t going to make sense.

But more importantly: none of what the GOP did was against any of Facebook’s rules. They bought all the data they needed from Experian, from press delivery companies, something that was a common request from advertisers then (who wanted to target based on credit rating, newspaper subscriptions). They tested a lot of their ideas and the audiences emerged organically from how people reacted to those. Read any story on how the two campaigns had such a different approach to campaigning online. No secret data was needed to make a difference: one tried, the other coasted.

What did Facebook do after they realised you could run a scandalous campaign using Experian and other third-party data? Block those (For many reasons, being slimy and not allowing people to edit false information was one; having shitty data in the first place was another big one). That was actually the best thing to do, but no one reacted to that actual, effective, meaningful change.

So what was Facebook to do after they had consistently adapted and enforced their policy for ten years on an API that wasn’t perfect, but that had learned faster than anyone else? Explain why they did what they did?

They tried and no one listened, accusing the company of inconsistencies rather than understand that, say, Mark doesn’t personally handle banned apps on the API. After too many people demanding that Facebook use Police power to delete data they didn’t know was still there, I think most of the senior brass checked out: it was a made-up scandal and nothing they could say or do would really help.

That populists get elected and run countries to the ground is real and problematic, but that’s due to mechanisms that are unrelated to Facebook targeting algorithms, at least as far as we can understand them. There’s growing inequality, diverging perspectives between citizens; a few operative have developed real expertise in spewing bullshit, but none of that will be solved if people don’t separate causes and false claims. Telling people to leave Facebook won’t make coordinated abuse organised on Discord servers less painful. Banning political advertising on Facebook won’t prevent Trump from confiscating the news cycle the day before an election by lying on Twitter about immigration, or Johnson in the UK to have a photo op to Google-bomb his way out of scandals. You won’t improve the situation with a sacrificial lamb.

People are still gladly sharing their bank data with unscrupulous credit companies; the NYTimes is too happy to ignore the real scandal there. Whatever Mark said would be a scandal either on the left (if he bans ads, including left-wing one) of the right (if he doesn’t include inflammatory bullshit as “the press”), so… the company stopped talking about it because nothing could be better if they did talk.

Being in that Catch-22 space tells you to move on. Mark did, months ago, thinking about products, integration, platforms. Somehow, Libra came up as a good direction for the company. That’s what Mark has in mind. And I don’t think his brain would be able to make better decisions about the company with more bullshit like “CA is the biggest scandal” at the forefront.

I've worked in marketing for an Agency. If Facebook had to check everything we did, they'd really had a hard time. We had multiple clients with multiple banners with multiple landing pages and all of this with a software rotating stuff with A/B tests and regressions. If a political party came to hire us, I'm sure we'll do it. This was in Spain, so I guess that there will be a lot more companies doing this in the US.

Checking all of this within an acceptable timeframe for advertisers requires a lot of labor. That comes with another wide set of problems. Loosy boundaries, arbitrary bans, increased cost of ads etc etc. I have no special sympathy for Facebook, but we have to understand that this is a really hard problem to solve, and maybe there won't be any solution that satisfies the public.

If it's hard, then it's hard. That no perfect, complete solution is possible just means we accept the imperfect, incomplete solution if it is an improvement on the status quo.

I really don't have a lot of sympathy for the woes of the advertising industry, considering the consequences.

If I, working there, found that FB was painful to work with, I'd go somewhere else, as simple as that. I won't spend two days wrestling with FB to see if my ad it's ok or not.

I may not care for political advertising, since my client is not looking to "make a profit" like a traditional customer. But I definitely do for an eCommerce brand for example.

Most agencies use FB because it's easy to work with and it's cheapish with a little expertise. If that dissapeares then nobody is goung to pour money on it.

Exactly.

Facebook et all grew on the fact that they could automate the process - but in reality they have only automated part of it - the easy part of taking the money and placing the ad - not checking the content.

If they can't check the content in an automated way ( very hard because people will be actively working against you ) and the remedy is to have an army of people checking content then their competitive edge over the traditional business model largely disappears.

The other way to fix this - which I'd imagine Facebook and Google might want to push - is to put the responsibility onto the generator of the content. Google and Facebook could easily help automate the the shifting of responsibility if everyone is identifiable.

Imagine the CASE act - but automated for everyone who breaches copyright on youtube... https://www.eff.org/deeplinks/2019/10/house-votes-favor-disa...

That would be like a newspaper not taking responsibility for a story being wrong - simply passing the legal blame onto a source - while there is some justice here as the source has a responsibility also, it is ignoring the fact that the platforms play a role in placing and promoting the content.

> this is a really hard problem to solve, and maybe there won't be any solution that satisfies the public

I'm not a fan of this argument. It reminds me of people saying that the big US banks are too big to fail during the sub prime mortgage crisis a decade ago. If the banks are too big to fail, then we better fix the regulation of those banks so they are less likely to fail or break the banks up.

If Facebook is too big to effectively moderate what it publishes, then reduce how much they publish (ie your Agency is going to be rate limited) or break up Facebook into pieces that can moderate their traffic.

Have you seen or heard of the Night Trap hearings? They refused to hear answers to accusations of lurid content that "wasn't even remotely in the goddamn game".

The have proven that they will refuse to listen to any answer that they can't grandstand about. I think Zuckerberg and Facebook is shady as hell and even I feel sympathy for him here and suspect he may be drugging himself to not snap at the hours of stupid and dishonest questions. Since it wouldn't be good for the company long term to make enemies by shaming them during their grandstand hours with many choice lines.

I think there is something else going on here.

Claiming that Cambridge Analytica had a catastrophic impact on the 2016 election. It's hard to know what to answer to such a thing.

If you get into a debate with her on whether her completely unproven claim is true or false you are guilty, if you try to avoid it you are basically saying she is right and are guilty.

This kind of questioning only works because it's not a courtroom. This would never be allowed as an actual question.

> Claiming that Cambridge Analytica had a catastrophic impact on the 2016 election. It's hard to know what to answer to such a thing.

"Congresswoman, half of this country thinks that your statement about the impact of Cambridge Analytica in the last election is factually false. In case you wanted to publish an ad that mentioned it, should Facebook reject it?"

That would expose him for not taking responsibility.
You cannot be asked to take the responsibility of an impossible task such as telling truths from lies. It's a responsibility that even the government doesn't want. What about "congresswoman, you pass a bill to institute the Ministry of Truth, and I'll scrupulously follow their indications".