Hacker News new | ask | show | jobs
by mateuszbuda 1448 days ago
In general I agree that harvesting public data is moral. I think that in these particular cases it's: 1) extracting data from profiles that opted for not being public (only available to logged in users) and 2) reposting scraped data (publicly?) as belonging to the guy who scraped it without users consent.
7 comments

Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

At the same time, I don't think all of Instagram's users care if their images are hidden, or not.

It's quite unfortunate Facebook/Meta is using hostile language and the word "scraping" together in this case. Scraping is a legitimate process used by various business models to gather information from the Web, which itself was originally intended to be an open forum for people to share content.

Hostile business models have corrupted that intent and turned it into a competitive environment that is harming users and legitimate models which may not have the funding larger corporations can muster.

I have a "scraper" I've built that will either snapshot a page from a user's browser or crawl it remotely with Selinium/Firefox, on the user's behalf, to save the content in an index for searching later, by that user. It's not automated, nor does it parse and crawl URLs in the pages saved. It doesn't use page content in a wider context, either.

I've spent a significant amount of time trying to "work around" anti-scraping efforts by various companies and it's frustrating to see hostility instead of cooperation in certain types of use.

> Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

1) It was public when the content was posted by its authors. Facebook locked it down retroactively, regardless of the author's intent.

2) A login requirement doesn't make it non-public, if making an account is trivial, and there are already hundreds of millions of accounts. Is the plot of Avengers: Endgame also not public, because it's locked behind a ticket purchase or subscription?

Also login requirement is not certain. e.g. Google doesn't need to login to index those pages, neither do you for first few profiles. Only after your identity (ip or fingerprint) is know instagram starts locking public content behind login gates.
> extracting data from profiles that opted for not being public

The tool lets you download the contact info of your friends, which you should be able to do anyway. In fact Facebook tries to trick its users into thinking they can do this with their data takeout option, but the downloaded files don't actually include any of the contact info for your contacts. Which makes zero sense, considering the entire point of Facebook is that it's a digital rolodex for storing your friends' contact info.

From the article, it seems to be service for scrapping data you have access anyway. As long as they only handle those data to the requesting customer, whose login they used, I don't see a difference between general public, and this users personalized "public". If access is still limited to the people who have the access-rights, then I don't see a difference between accessing through the official interface, or via scrapped data.
Users make information available on facebook with the expectation that they are able to later control access to it (other than the obvious threat model of screenshotting, etc). This is violating that expectation and thus their privacy.
> they are able to later control access to it

This has never realistically been the case. An illusion of control is provided by facebook, but they've never really put much effort into it. For a really simple example, look at how long content remained available to the entire internet after "deletion". Sometimes it took years.

Expecting any semblance of privacy from a company who profits from using and selling your data is, if I'm being blunt, lunacy.

This is a false expectation and it’s important people learn this.
They’ll stop posting in the way they currently enjoy and will, therefore, have lost some freedom. Great outcome!

In other news: your partner may also leak your most intimate secrets. I hope they do, to teach you a lesson?

Every trust can be betrayed. Why do you believe a world without trust would be better? Only because you cannot handle the nuance of different levels of trust?

> In other news: your partner may also leak your most intimate secrets

Indeed, and that's why it's important to choose the right partner. Likewise, it's important to choose the right friends on instagram to share your photos with. Because as you noted, they can always screenshot away and there's nothing Facebook can do.

What's dangerous is thinking that Facebook/Meta is the keyholder. That's a false perception, perpetrated by Facebook because they want to monopolize everyone's data. It was and always will be about the people who you share your information with. Don't want your profile scraped and leaked? Don't share it with sketchy people.

The counterparty risk from Facebook has almost nothing to do with trust of individual human beings. It has to do with the nature of systems, failure, vulnerabilities, attack surface area, etc. It's "privacy through obscurity" to act in a way that your data is not on the precipice of being leaked by a bad actor or a mistake.
The freedom to live in a fictional world where Facebook safeguards your data is just as available regardless the reality of the situation.

The reality of the situation is that Facebook is a walled garden built on the labor of it's users and it is objecting to those users reclaiming the fruits of their labor by scraping.

So taking shackles off is called “losing freedom” now? Also, people enjoy many things, just look at the junkheads. Still, it's more natural to have trust in a heroin addict than to have trust in businesses like Facebook.
"They’ll stop posting in the way they currently enjoy and will, therefore, have lost some freedom."

That is, quite honestly, one of the oddest definitions of freedom I've come across.

There's no evidence of the accused scraper sharing the scraped data with anyone but the account-holder, so the privacy of their friends is still protected.
The state of "opted for not being public" and 'available to any system authenticated person' seem contradictory.

I appreciate that 'system authenticated person' is a smaller set than those who can access anything publicly accessible, and that the former is a subset of the latter.

I agree with the moral argument against posting the scraped data publicly, but if someone gave my account access to their data, I don't think they have a moral right to say I can't use a script to do something private with it.

Scripts are tools, and like any tool they're extensions of the self. If it's morally okay to do it by hand, it's morally okay to do it with a script, so long as my script is respectful of server resources.

Instagram behind a login screen is public. If you say were an OnlyFans model and somebody paid for your videos, scraped them, then there would've been implicit agreement.

Sharing photos on Instagram, there is no such understanding, news outlets have been logging in to view and publish your instagram photos so.

If they are being harvested it makes them public by definition. Unless there was a break-in.