Hacker News new | ask | show | jobs
by mobileexpert 2056 days ago
Some takes From Benedict Evans that are worth considering: https://twitter.com/benedictevans/status/1320378054150148098...

“Meanwhile: the NYU app has access to friend data in your feed and friend data is also in the ads it scrapes. And it replaces an actual security model with our trust that NYU are nice people and won't abuse this access. That is exactly how Cambridge Analytica happened.”

10 comments

Comparing Cambridge Analytica, who harvested data though means that were not transparent to users (and for malicious purpose), to NYU has explained what data and why, AND has the consent of its users, seems disingenuous at best.
The point is that CA's data harvesting looked like it was transparent to users at the time they were doing it — which is precisely the appearance you'd expect a malicious app to try to convey.

The NYU project is probably on the level, but "they're probably on the level" isn't a very good security model at Facebook's scale.

More to the point, the FTC's 2019 Consent Decree [1] makes it fairly clear that FB is responsible for third parties' access to its users' data — and it would be prudent (from FB's point of view) to interpret this responsibility as also covering browser extensions.

[1] https://www.ftc.gov/system/files/documents/cases/c4365facebo...

For a project like this to happen at a major US university (especially once outside funding is involved), it needs approval of the university's Institutional Review Board. Getting IRB approval entails researchers proposing a strict set of guidelines for how the data will be collected/used/stored, examining the potential for harm to participants, and convincing a room of very very risk averse individuals that the project is safe and bounded in scope.

This is in stark contrast to CA. "They're probably on the level" because they have entire systems in place to keep them there.

The data CA used wasn't collected by them. They got it from a research project at Cambridge University's Psychometrics Center. This is exactly the same situation.
You are a little short on facts. Dr Michal Kosinski and Dr David Stillwell of Cambridge University pioneered the use of Facebook data for psychometric research with a Facebook quiz application called the MyPersonality Quiz.

Aleksandar Kogan was a lecturer at Cambridge who then built his own app based on Stilwell's and Kosinki's app and work. Aleksandar then turned around and sold his version to SCL - the parent of Cambridge Analytica. And the reason that Cambridge Analytica wanted his app was because it worked under the social network’s pre-2014 term of service which allowed app developers to harvest data not only from the people who installed the app as well those people's friends.

Stillwell also denied Kogan's request for access to to his and Kosinskis myPersonality dataset. So No the Cambridge Analytica data did not come from Cabridge University or the Psychometrics Center.

The NYU Ad Observatory's data is completely public and the intended audience of that data is journalists and researchers doing analysis of online political advertising. This is the polar opposite of clandestinely harvesting user data in order to manipulate people.

So no it's not "exactly" the same situation but rather the exact opposite.

From the Wired magazine explainer on CA:

"That data was acquired via “thisisyourdigitallife,” a third-party app created by a researcher at Cambridge University's Psychometrics Centre. Nearly 300,000 people downloaded it, thereby handing the researcher—and Cambridge Analytica—access to not just their own data, and their friends' as well."

https://www.wired.com/amp-stories/cambridge-analytica-explai...

re: "the exact opposite", you are putting a lot of weight on the intention behind this use. After the public response to CA you might appreciate why FB is going to strictly apply the rules.

But I generally agree that users running an extension in their own browser is a different situation than an app developer subject to the FB ToS and am not sure why FB would be allowed to block this.

That makes quite a bit more sense. Thanks for clarifying.

To the grandparent: A researcher selling IRB-protected data would be effectively ending their academic career and opening themselves up to a mountain of legal trouble from the university and anyone who participated in the trial.

To clarify:

WHAT they were doing with the data was not transparent. HOW they were doing the data collection was completely transparent.

The worst of both worlds. Which is to say—we're saying the same thing.

Univeristy research projects such as these go through extensive review. the univeristy is basically putting their name on the line for any research project that happens under their watch.

I'm not sure what you're advocating for. Is it that Facebook shouldn't be researched because they do not allow it? Not very sound reasoning to me.

Users have to install a browser extension in order to participate in the study. That's a way higher barrier than the personality quizzes that Cambridge Analytica used.

It also happens at a different layer of abstraction. Cambridge Analytica extracted data through the permissions framework that Facebook itself implemented.

Facebook's interest in its users' data doesn't need further explanation after you see that most of their profits derive from their control over it. The same control that allowed the profitable mass political targeting that these researchers are trying to study.

The researchers ask people to opt in tracking a restricted amount of data, and then install an extension that has access to their entire Facebook accounts.

There is no way for Facebook or anyone else to prove that the current or a future version of the NYU's extension won't scrape more data than people agreed to.

> There is no way for Facebook or anyone else to prove that the current or a future version of the NYU's extension won't scrape more data than people agreed to.

How so? The extension is open source, anyone can audit it.

the plugins are just javascript, so verifying that is actually a trivial task. You just open the plugin and read the source. NYU could also provide the code, to make it even easier.
You cannot verify that the researchers won't change the plugin to malware in the future.
The whole point is that a major problem with CA was the scaled friend’s data collection. The NYU app scraping modality could easily do the same thing which violates the present FB consent/sharing model of you control your data going to or not going to third party apps. FB has to fight as hard as possible against such apps. Remember Clearview AI? If we want FB to fight CA and Clearview they must fight here as well.
> If we want FB to fight CA and Clearview they must fight here as well.

Or they could partner with NYU, offer technical insight to maintain integrity and privacy (me stifles laughter) and do everything to support researchers who potentially could help build trust in their platform.

Going after this group just isn't a good look if you're Facebook. If there are valid concerns then don't start with a Cease and Desist.

They might be willing to partner if NYU is willing to indemnify Facebook against any and all liabilities which may result. How likely is NYU to take on that risk? Why should we expect Facebook to take on the risk for NYU?
So is your opinion just that facebook just shouldn't be researched?
I don't really have a view on that, but I think researchers and universities should be held fully liable for the harms they cause, that way, they'll be more careful.

Some research just isn't worth the risk, but as an outsider, I'm not in a place to make that judgement. NYU could also insure against data breaches; in that case, we might get some good security audits.

Hang on. The whole chain of reasoning started with FB protecting users' interests through the permission system, which NYU ostensibly circumvented. How is it in the users' interests to indemnify Facebook?
If NYU internalizes the cost of all breaches (by indemnifying FB against harm), they will be very careful with the data, and prevent another Cambridge Analytica problem.
> The NYU app scraping modality could easily do the same thing

So could any browser extension with the ol' "read and modify your data on \*" permission. Or any browser. Or any third-party Facebook client.

There is a difference between being technically capable of doing a thing and actually doing the thing- especially in cases where the software authors are well-known and relatively easy to hold accountable. To say otherwise is a little bit goofy!

> especially in cases where the software authors are well-known and relatively easy to hold accountable

Like a certain lecturer and senior researcher at University of Cambridge?

https://en.wikipedia.org/wiki/Aleksandr_Kogan

Suppose NYU sent a person to sit behind every NYU participant, and take a photo of their screen each time it changes - that would be exactly the same as NYU is doing (except more expensive); the participant knows that and gave their consent. It is within their rights to show their screen to anyone.

They are just doing it more economically then sending a person. This is entirely unlike CA, which effectively, sent a person to go through all participants available information as quickly as possible while they weren’t looking and store a copy of everything.

Sure. So what exactly is the binding rule which Facebook should apply here?

Rsearchers can get access to anyone's Facebook data if people enable it? What about the ones in chinese universities? Or just respected universities? Which universities is that? How do we decide?

You're missing the point. There needs to be a black and white line, and whatever Facebook allows they're always being demonised, nobody gives them the benefit of the doubt.

> Rsearchers can get access to anyone's Facebook data if people enable it?

Yes. Where is the problem?

This is ironic. Cambridge Analytica was a university with an IRB collecting personal data, then later sold to foe-profits.
Cambridge Analytica happened with an app hosted on Facebook. This is hosted on your browser. So it’s not exactly how Cambridge Analytica happened because the trust model is completely different.
The legal problem & consequences for Facebook werent because of users who opted in to CA collection, the problem was getting your friend's data, who did not consent.
The legal problems for Facebook was mainly because they were an active party to the collection process, which could not have happened without that active participation.

This collection can happen manually, within the users’ regular and fully authorized use, without facebooks involvement, and in fact without any ability for them to figure out that it happens.

That it happens through a browser extension (which they may or may not be technically able to detect) should not change legality or legitimacy.

Well, what exactly should NYU do instead? There is no API with fine-grained permissions that they can use: To get the data they are interested in (ads), they have to resort to scraping - and a scraper will always have access to all data on the page.

So there is no way for NYU to not have access to friend data if they want access to ad data.

Weak take. All users of the NYU app have to explicitly sign up and grant the researchers access to their data. It has a very clear privacy policy:

https://adobserver.org/privacy-policy/

And, unlike Facebook which sucks up an ever increasing amount of data on you, this project takes only basic demographic information (age group, gender, ethnicity) and what ads that you're shown. No personal data is retained by NYU.

The user signed up, but the app would have access to that user's friend's data who didn't sign up
It only has access to what the user is browsing; if a friend hasn’t posted in a year (and doesn’t appear on timeline) and user doesn’t go specifically to their page, then adobserver would be oblivious the the existence of that user (and of the friend relation).

This is entirely unlike an FB app like CA’s that had full unadulterated access to anything the user might browse.

Nope. This isn't a Facebook app. It's a browser plugin.
Doesn't NYU have an Institutional Review Board?
So does the University of Cambridge, and that didn't prevent one of their researchers from scraping and selling user data to Cambridge Analytica.
As far as I've been able to find out, Kosinski and the others developed their techniques at University of Cambridge (and other universities), then took those techniques to Cambridge Analytica/SCL (something that no one here would have any complaints about); CA/SCL then applied them to Facebook. The UofC IRB has no influence on that.

If there is any evidence that CA/SCL/Kosinski said the data collection was affiliated with UofC, I cannot find it. And when Kosinski attempted to use the data in his research, the UofC IRB denied it.

In this case the data collection is by the NYU AdObservatory project, meaning the data collection and its use (should) have to go through the IRB.

It does.
That's enough? Any university with an IRB can scrape people's personal data?
Just in case OP never comes back or you're not aware when you reply later:

This was _exactly_ the issue with CA, data for academics with an IRB laundered into a for-profit entity.

More or less, yes. The purpose of IRB review is to ensure that personal data collection and use are legally and ethically kosher.

Cambridge Analytica and the researchers when they were working for it never claimed to be doing UofC research; if they did, UofC could and should have applied an academic (and possibly legal) baseball bat to their collective face. In fact, when Kosinski did try to use the data as part of his UofC related research, the UofC IRB denied it.

Too bad he didn't cite where in the source code it does any of this stuff.
I mean, I trust NYU researchers a lot more than I trust Facebook execs.

All these big data-harvesting companies (FB, Google, etc.) start with the false premise that well-informed users have affirmatively chosen to trust that company with their private data.

s/NYU app/Google Chrome/g and somehow FB is ok with it, so it isn't security model, it is the people and the goals of their actions what ire FB.
>"The supposed scandal around the data analytics supplied to campaign groups by Cambridge Analytica was manufactured by people with a political agenda.

>...UK Information Commissioner’s Office has published the findings of its three-year investigation (predating the scandal) into the matter, which concluded there was no illegal electoral interference whatsoever...In other words, the data was commercially available and concerned US voters. The only ‘special sauce’ in CA’s model was the hyperbole of its sales people..." [1]

the left has pushed a false narratives and misinformation making Cambridge Analytica, like Russia, the convenient scapegoat for all the things. The same tricks are in play now with Hunter Biden's laptop coverage, which is non-existent from MSM

[1] https://telecoms.com/506834/uk-information-commissioner-conf...

Turns out Cambridge analytics might have been practically useless https://www.wired.co.uk/article/cambridge-analytica-facebook...
That's not what the article you posted is arguing. The article is arguing that the data they collected directly through their trojan apps is not particularly useful, NOT that the full connection graphs (including data drawn from connections of users who didn't use their app) they collected and allegedly used to target advertising is useless.