Hacker News new | ask | show | jobs
by HeckFeck 1448 days ago
Data harvesting is moral for me, but not for thee.
2 comments

In general I agree that harvesting public data is moral. I think that in these particular cases it's: 1) extracting data from profiles that opted for not being public (only available to logged in users) and 2) reposting scraped data (publicly?) as belonging to the guy who scraped it without users consent.
Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

At the same time, I don't think all of Instagram's users care if their images are hidden, or not.

It's quite unfortunate Facebook/Meta is using hostile language and the word "scraping" together in this case. Scraping is a legitimate process used by various business models to gather information from the Web, which itself was originally intended to be an open forum for people to share content.

Hostile business models have corrupted that intent and turned it into a competitive environment that is harming users and legitimate models which may not have the funding larger corporations can muster.

I have a "scraper" I've built that will either snapshot a page from a user's browser or crawl it remotely with Selinium/Firefox, on the user's behalf, to save the content in an index for searching later, by that user. It's not automated, nor does it parse and crawl URLs in the pages saved. It doesn't use page content in a wider context, either.

I've spent a significant amount of time trying to "work around" anti-scraping efforts by various companies and it's frustrating to see hostility instead of cooperation in certain types of use.

> Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

1) It was public when the content was posted by its authors. Facebook locked it down retroactively, regardless of the author's intent.

2) A login requirement doesn't make it non-public, if making an account is trivial, and there are already hundreds of millions of accounts. Is the plot of Avengers: Endgame also not public, because it's locked behind a ticket purchase or subscription?

Also login requirement is not certain. e.g. Google doesn't need to login to index those pages, neither do you for first few profiles. Only after your identity (ip or fingerprint) is know instagram starts locking public content behind login gates.
> extracting data from profiles that opted for not being public

The tool lets you download the contact info of your friends, which you should be able to do anyway. In fact Facebook tries to trick its users into thinking they can do this with their data takeout option, but the downloaded files don't actually include any of the contact info for your contacts. Which makes zero sense, considering the entire point of Facebook is that it's a digital rolodex for storing your friends' contact info.

From the article, it seems to be service for scrapping data you have access anyway. As long as they only handle those data to the requesting customer, whose login they used, I don't see a difference between general public, and this users personalized "public". If access is still limited to the people who have the access-rights, then I don't see a difference between accessing through the official interface, or via scrapped data.
Users make information available on facebook with the expectation that they are able to later control access to it (other than the obvious threat model of screenshotting, etc). This is violating that expectation and thus their privacy.
> they are able to later control access to it

This has never realistically been the case. An illusion of control is provided by facebook, but they've never really put much effort into it. For a really simple example, look at how long content remained available to the entire internet after "deletion". Sometimes it took years.

Expecting any semblance of privacy from a company who profits from using and selling your data is, if I'm being blunt, lunacy.

This is a false expectation and it’s important people learn this.
They’ll stop posting in the way they currently enjoy and will, therefore, have lost some freedom. Great outcome!

In other news: your partner may also leak your most intimate secrets. I hope they do, to teach you a lesson?

Every trust can be betrayed. Why do you believe a world without trust would be better? Only because you cannot handle the nuance of different levels of trust?

> In other news: your partner may also leak your most intimate secrets

Indeed, and that's why it's important to choose the right partner. Likewise, it's important to choose the right friends on instagram to share your photos with. Because as you noted, they can always screenshot away and there's nothing Facebook can do.

What's dangerous is thinking that Facebook/Meta is the keyholder. That's a false perception, perpetrated by Facebook because they want to monopolize everyone's data. It was and always will be about the people who you share your information with. Don't want your profile scraped and leaked? Don't share it with sketchy people.

The counterparty risk from Facebook has almost nothing to do with trust of individual human beings. It has to do with the nature of systems, failure, vulnerabilities, attack surface area, etc. It's "privacy through obscurity" to act in a way that your data is not on the precipice of being leaked by a bad actor or a mistake.
The freedom to live in a fictional world where Facebook safeguards your data is just as available regardless the reality of the situation.

The reality of the situation is that Facebook is a walled garden built on the labor of it's users and it is objecting to those users reclaiming the fruits of their labor by scraping.

So taking shackles off is called “losing freedom” now? Also, people enjoy many things, just look at the junkheads. Still, it's more natural to have trust in a heroin addict than to have trust in businesses like Facebook.
"They’ll stop posting in the way they currently enjoy and will, therefore, have lost some freedom."

That is, quite honestly, one of the oddest definitions of freedom I've come across.

There's no evidence of the accused scraper sharing the scraped data with anyone but the account-holder, so the privacy of their friends is still protected.
The state of "opted for not being public" and 'available to any system authenticated person' seem contradictory.

I appreciate that 'system authenticated person' is a smaller set than those who can access anything publicly accessible, and that the former is a subset of the latter.

I agree with the moral argument against posting the scraped data publicly, but if someone gave my account access to their data, I don't think they have a moral right to say I can't use a script to do something private with it.

Scripts are tools, and like any tool they're extensions of the self. If it's morally okay to do it by hand, it's morally okay to do it with a script, so long as my script is respectful of server resources.

Instagram behind a login screen is public. If you say were an OnlyFans model and somebody paid for your videos, scraped them, then there would've been implicit agreement.

Sharing photos on Instagram, there is no such understanding, news outlets have been logging in to view and publish your instagram photos so.

If they are being harvested it makes them public by definition. Unless there was a break-in.
It's their platform. Do you really want some random companies scraping your facebook and instagram posts?
> Do you really want some random companies scraping your facebook and instagram posts?

Thought experiment: if you want to keep control over your data, try something radical: don't hand it to Meta/FB/IG at all

(Full disclosure, I'm neither on FB nor IG)

Yes. I want a free and open web.
Good for you. Normal people do not want posts shared privately amongst friends to become publicly available.
Then why would you ever put it on a website that generates its revenue from using and selling your data?
Because you're (not you, but people in general) are dumb and overly trusting.
This is the correct answer.
Because you agreed to do so under the terms of conditions of that website.
Look I understand you point from a legal standpoint, but do you really truly believe even a small fraction of FB and IG users actually “agreed to do so under the terms and conditions of that website”? They just clicked whatever was necessary to create their accounts. I doubt there was much affirmative agreement going on there.
There's no evidence the scraper companies mentioned there are making the scraped data public or sharing it with anyone beyond the individual customer that is already entitled to access that data through the official clients.
Then you need to trust your friends, because copy/paste and screenshots exist.
I'd rather anyone than "just Facebook".

"Just Facebook" has made the web shittier; entire realms of essentially public, often great content hidden behind a login wall.

It’s not “your Facebook”, it’s Facebook’s Facebook. You already made that data public, otherwise it would be impossible to scrap it.
As others said, there is no “you” in the scheme. It's Facebook's data. When people access that data without paying, they are “bad guys”. When the very same people pay for it, they are “legal partners”. In both cases they can do anything with it, while Facebook can't be held responsible because of all the official agreements. So as long as there is no specifically bad publicity or money loss anything goes either way.

“You” only exist in numerous empty statements about “privacy”, “respect”, etc. If you are feeling artsy, you can make that hyped NFT thing out of those, and see whether those kilobytes of text really worth anything.

What you are claiming here is not true in Europe. If FB hold data about you, the data is still your legal right. You can have it deleted and changed if it is somehow untrue and have variou other rights too.

There is a relationship involved because ultimately as a FB user, if I don't like what they are doing, I can ask them to remove my data permanently and they must legally do that. If someone has "scraped" that data (if it is considered PID), without my permission or a legal basis to do so, they are in breach of the GDPR and can have enforcement taken against them.

I think some of these "aggregation" businesses will fall foul of this in Europe but I don't know what will realistically happen if that business does not exist in Europe and breaches the GDPR.

This is how it works in press releases. The problem is that data protection laws were in fact lobbied by corporations either openly or behind the scenes, and focus on things like real names and passport numbers that look impressive but aren't really important for the data market. These are just put into some high security database (e.g. for billing info), and it's fine. However, the real behavioral data that costs money is shared as easy as it ever was in the form of “User ID <long number> was at the location of Wi-Fi AP ID <another long number>”. It doesn't matter that the data owner still trades all the history of activity of a certain individual, or that Wi-Fi station locations can be matched with some external database. Everything is fine as long as you don't slap someone's real name on that. And, contrary to the show social networks make, they couldn't care less about real names. Even if you trick the system by calling yourself John Doe, you still look at the specific content, and have specific contacts, you are you, and the data is the same.

I remember that about a decade ago some IT guys have paid for the common Facebook advertiser access, then targeted the ad campaigns using filters in such a way that their intersection only resulted in a single user, or just a couple of them, and were able to match those “anonymized” accounts to real ones. You didn't have to be a genius to do that. Facebook certainly knew it could be used like that. Everyone who made money on that simply agreed to use “anonymization” as a smokescreen. Later, with all the scandals, those routine operations were presented as something exceptional done by a small number of bad actors.

> breaches the GDPR.

Facebook breaches the GDPR all the time and manages to stay in business. GDPR enforcement is barely existent, and when it does happen, it's insufficient.

You published them for the world to see... so yes, presumably.