Scraping is not an attack. Monopolists want to pretend they own your data because they get unlimited access to monetize it whereas competitors should have none.
"self-compromised"
Monopolists want to sell you thus it's imperative they maintain the fiction of "one person, one account". By admitting you own your account, they'd have to allow sharing and they wouldn't be able to provide their customers (advertisers) with reliable data about individuals.
"protect people from scraping"
Monopolists will protect themselves and call it protecting you. They will attempt to make you afraid of some other actor using your data in harmful ways so as to detract from how they monetize you and use your data in harmful ways.
"deter the abuse"
Monopolists don't want to argue about what constitutes abuse. Anything they write in their TOS is entirely for their benefit and only constrained by local law (if that). They will abuse you to the fullest extent they can get away with while arguing that any action to use your rights is "abuse."
"safeguard people against clone sites"
Monopolists want to maintain their monopoly, there is no greater threat than a direct challenge to that monopoly by allowing data to move freely.
--
More subtle but even more ironic rhetorical points
"for hire" / "paying for access"
Emphasizing that people making money (gasp) for providing this service, is bad.
"industry leader in taking legal action" + "across many platforms and national boundaries, also requires a collective effort from platforms, policymakers and civil society"
Monopolists can pay high priced marketers to rebrand them as patriotic hero figures fighting valiantly for the little guy.
While I agree with your assessment of the BS in the article wrt scraping, and also agree with your assessment that the behaviour is completely about FB protecting itself and its monopoly control (the word control being important), I think its important to emphasize its not about FB caring whether other entities having access to the data, its about FB caring about it's public perception with regard to its having that data at all.
Over the last few years or so it feels like, to reference a @dril tweet[1], Facebook has just been 'turning a big dial taht says "data access" on it and constantly looking back at the audience for approval like a contestant on the price is right' with how much it allows 3rd parties to get at its data.
Keep in mind ~5 years ago the big thing at FB was "Open Graph" and "Graph Search" which gave everyone really in-depth access to their data with the idea that Facebook would be the "data platform" on top of which all of these 3rd parties would build apps and interfaces. This of course eventually resulted in the whole Cambridge Analytica thing and now this gigantic swing in the other direction of being overly protective of the data as a kneejerk PR reaction to all the bad press.
FB loved sharing data and provided a direct API for accessing it when the public narrative was about data freedom and 3rd party developer friendliness and it hates giving any access at all and goes around sues web scrapers now that the public narrative is all about privacy.
Facebook will happily align itself in whatever way results in the least public outcry arguing they shouldn't be allowed to have the data in the first place regardless of if that means giving access or restricting it.
The users agreed to share their data with Facebook, not some other company. If they didn't prevent this, they'd be asking for another Cambridge Analytica
There’s an important difference between technically consenting and informed consent.
Given what I know about the bot problem on Instagram, I would imagine many people have been tricked into sharing their private profiles with scraping bots. Many bots are copying real people’s profiles and then spamming their friends with follow requests. It’s highly effective and gives these bots access to private profiles.
Were they showing the private data to everyone, or just to the person whose account was used for the scraping? If it’s the latter, then this is also not a crime, it is just someone accessing data they have been authorized to access, but in an automated way.
I don't think so, it is more like you scrape what is accessible to this user. So in the end you will scrape your friends data. This is why I said that you are free to only share with friends that 'you trust'.
All that case says is "scraping is not a violation of the CFAA". But of course the scraped data still exists in legal limbo; maybe you can compute derived information from it, but the moment a scraper reproduces it there is all of copyright law waiting for them.
In that case, the user owns the copyright, not the company, as the user is the author. So it would be up to them to take legal action if deemed necessary.
The only argument I have here (sadly in favor of FB) is with "safeguard people against clone sites". While I did give my data to FB, I didn't approve that transfer to another site/system. That is the only place I could possibly see some legal foot hold.
What happens when FB builds a shadow instagram profile of you based on your FB account? That already happens. FB clones their own data for other projects no different than what you might fear happening if this data were cloned to a third party. The cat is out of the bag already but FB wants to pretend they are the only ones with the right to abuse.
It's impossible to control information once been created. The longer it's existed and the more locations you can see it make that spread exponentially more likely.
Wehether we make that spread of informationlegal or not does little to affect whether it happens.
There are two things that might help. First, don't share as much information. Once it's no longer limited to you or your close group of friends which hopefully won't share it along with your name, it's mostly out of your control. Second, put limits (laws) on what information companies are able to synthesize about you, and how long they can retain it. If there's less information created about you (or it's ephemeral, created and destroyed as needed), and if they need to clean out older data, there's less to be shared or stolen.
“It’s hard to enforce the rule of law” is not a good reason to abandon it entirely. Data privacy laws make data privacy better even without being 100% infallible.
We should be both practicing good data hygiene and using legal tools to combat those who abuse data privacy.
> “It’s hard to enforce the rule of law” is not a good reason to abandon it entirely.
I didn't?
> We should be both practicing good data hygiene and using legal tools to combat those who abuse data privacy.
That's what I said. The first thing is data hygiene, the second is legal requirements. The difference I think is that the legal requirements should be on the actual creation and retention of the data, not just who owns it, who it can be shared with, etc.
As soon as PII information over a certain age is radioactive and linked to a fine per person, all of a sudden there'll be a lot less giant repositories of PII to worry about.
they also toss in the chinese affiliation in hopes to bring even more ill will from the reader towards the company. china is probably doing some bad things, but scraping facebook ain’t one of them.
Scraping social media is something that China is very notorious for doing. They are 100% positively scraping all major social networks around the world.
They do this to collect information of foreign policy interest to them, to silence political dissidents abroad, etc.
Let's start with one thing: copyright on databases. Take IMDb: they collect and combine totally open data on movies cast, crew, soundtracks used and so on. Everyone can go to the cinema, wait until movie ends, write down data from credits roll and put it on the database. There's no prohibition on this activity. Cinema may prohibit filming inside, but not using pencil on paper. Or you may buy a DVD released later, and do just the same. Or you may even write a movie company email asking for those data in electronic form and chances are they will send it to you or point to some promo materials website where it is published already.
But the entire database is a product of work, and that makes it valuable. So the company or organization spent time and money collecting, indexing and cross-linking those data, and has a right to bank on that work. Easily copying that database for commercial purpose _is_ stealing. This is why we have a database copyright laws.
Now back to Meta. They created this product and made it attractive enough so people are adding their data voluntary. Every single piece of data is quite open (maybe not really so for personal bits like face photos, emails and phone numbers). Meta spent a lot of cash making and keeping product that attractive, and now banks on those collected data by targeting ads.
Nothing in the world prohibits everyone else to create a service, make it valuable, attract people, collect data (according to data collection laws) and bank on that. But just copying data collected my Meta is stealing, and Meta is in its own right to protect it. The fact that Meta did it before doesn't makes it monopolist. In fact, there are lots of companies doing the same, like Google, Amazon, Apple, eBay etc. So in my opinion it is not a monopoly defending its' position, but rather business defending its' assets from stealing.
Indeed. It's the height of hypocrisy for a company to define the borders of its own system and then prosecute those who they consider in violation of them. There is no consideration given to whether the data should have been collected and retained by Facebook in the first place, regardless of whatever arbitrary access policies they defined to fit their own business and data model.
It's not clear what Facebook's position on scraping truly is. Sometimes they downplay it as "normalized and widespread," and other times they castigate it as inexplicably legal and clearly immoral, or even outright "in violation of state and federal law." For example:
- April 2021. Researchers find an exposed database containing the scraped data of 533 million facebook users. Some news reports refer to it as a "breach." Facebook attempts to downplay the issue as the result of third party scraping. Headline in ZDNet: "Internal Facebook email reveals intent to frame data scraping as ‘normalized, broad industry issue’" [0]
- October 2020. Facebook announces lawsuits against companies it claimed created a "malicious extension on Google’s Chrome Web Store designed to scrape Facebook, in violation of Facebook’s Terms and Policies and state and federal law." [1]
So... which is it? Does Facebook believe that scraping is a "broad, normalized industry issue?" Or is it a violation of "state and federal law?" It seems like they measure severity of its impact primarily based on the reactions of political commentators.
And what's the difference between automating a browser and automating an API client? Why did Facebook design an API for accessing the data they collected, if it's illegal to collect? They've even claimed to be the victim of Cambridge Analytica, who purchased a "quiz" application created by a developer who pieced it together using code straight from the "examples" section of Facebook's API documentation.
There is one obvious resolution to this apparent contradiction. If we remove Facebook from the question, then the contradiction resolves itself. All we need to do is stop presuming that Facebook has the right to collect and retain this data in the first place. And as a user, if you publish your data to a website designed for sharing it with other people, then by definition it is no longer private data. Therein lies the central question: what is "semi-private" data, and who controls its boundaries?
p.s. another thing they never mention is why companies want to scrape lists of facebook users. perhaps it might have something to do with the "lookalike audience" feature, and its more precisely targetable predecessors, which allow advertisers to upload a list of usernames and email addresses for targeted advertising?
Of course, Facebook wants to make it sound like scraping is illegal, when it generally isn't.
But account hijacking and mass-creation of accounts just to access private pages are clear violations of the Facebook and Instagram ToS, so they surely can sue for that.
Most law suits aren't due to breaches of the law, but breaches of contract. Whether terms of service constitute an enforceable contact is another matter.
Nope, it's not a settled question in the way that I think you mean. Each ToS is different so each would be subject to individual legal analysis in court on its own terms.
Questions would include whether the ToS is unconscionable, whether the terms violate laws of the locality/nation, and so forth.
It's the same with traditional contracts - the fact that contracts have been around for hundreds (maybe thousands) of years doesn't mean much if you and I create a brand new one between us. Our contract's specific terms (and events/actions between us as a result) would be the issue in court.
Why can't FB simply include a clause like "No kind of automated scraping is allowed, except for search engines in robots.txt"? This would save them so much time in court, arguing over the use of fake accounts which should really be irrelevant.
That is why they are suing rather than pressing charges. When someone steals your car you don't sue them you press charges. When someone doesn't uphold their end of a contract you don't press charges you sue for breach of contract.
in reality, you as an individual can't press charges. Only the state can. And many times the state chooses not to. You can sue in civil court, but individuals can't bring cases in criminal court.
You are confusing pressing charges and indictment. Pressing charges just means you accuse somebody of a crime and “press” the prosecutor to indict them. So the state does have the ultimate say on who is prosecuted, but that doesn’t mean you can’t press charges.
As far as I am aware it isn't a specific thing, but a general catchall term for going through the process of filing a criminal complaint, and seeing it through to completion. Maybe there is better words for it but "pressing charges" is what they use on TV so it is top of mind.
In general I meant there is a difference between criminal and civil law, and suing generally refers to civil not criminal law.
I don't have a source for this, but my recollection is that this has been successfully argued by a couple of companies—but then an appeals court found very firmly that it was not the case.
Essentially, having that be true would mean that any given website could create whole new classes of criminal behavior.
> having that be true would mean that any given website could create whole new classes of criminal behavior.
While this is true, reading the lawsuit it is clear that Meta is suing in civil court, so maybe they're trying to enforce their contract, especially their automated collection ToS (https://www.facebook.com/apps/site_scraping_tos_terms.php)?
In general I agree that harvesting public data is moral.
I think that in these particular cases it's:
1) extracting data from profiles that opted for not being public (only available to logged in users) and
2) reposting scraped data (publicly?) as belonging to the guy who scraped it without users consent.
Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".
At the same time, I don't think all of Instagram's users care if their images are hidden, or not.
It's quite unfortunate Facebook/Meta is using hostile language and the word "scraping" together in this case. Scraping is a legitimate process used by various business models to gather information from the Web, which itself was originally intended to be an open forum for people to share content.
Hostile business models have corrupted that intent and turned it into a competitive environment that is harming users and legitimate models which may not have the funding larger corporations can muster.
I have a "scraper" I've built that will either snapshot a page from a user's browser or crawl it remotely with Selinium/Firefox, on the user's behalf, to save the content in an index for searching later, by that user. It's not automated, nor does it parse and crawl URLs in the pages saved. It doesn't use page content in a wider context, either.
I've spent a significant amount of time trying to "work around" anti-scraping efforts by various companies and it's frustrating to see hostility instead of cooperation in certain types of use.
> Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".
1) It was public when the content was posted by its authors. Facebook locked it down retroactively, regardless of the author's intent.
2) A login requirement doesn't make it non-public, if making an account is trivial, and there are already hundreds of millions of accounts. Is the plot of Avengers: Endgame also not public, because it's locked behind a ticket purchase or subscription?
Also login requirement is not certain. e.g. Google doesn't need to login to index those pages, neither do you for first few profiles. Only after your identity (ip or fingerprint) is know instagram starts locking public content behind login gates.
> extracting data from profiles that opted for not being public
The tool lets you download the contact info of your friends, which you should be able to do anyway. In fact Facebook tries to trick its users into thinking they can do this with their data takeout option, but the downloaded files don't actually include any of the contact info for your contacts. Which makes zero sense, considering the entire point of Facebook is that it's a digital rolodex for storing your friends' contact info.
From the article, it seems to be service for scrapping data you have access anyway. As long as they only handle those data to the requesting customer, whose login they used, I don't see a difference between general public, and this users personalized "public". If access is still limited to the people who have the access-rights, then I don't see a difference between accessing through the official interface, or via scrapped data.
Users make information available on facebook with the expectation that they are able to later control access to it (other than the obvious threat model of screenshotting, etc). This is violating that expectation and thus their privacy.
This has never realistically been the case. An illusion of control is provided by facebook, but they've never really put much effort into it. For a really simple example, look at how long content remained available to the entire internet after "deletion". Sometimes it took years.
Expecting any semblance of privacy from a company who profits from using and selling your data is, if I'm being blunt, lunacy.
They’ll stop posting in the way they currently enjoy and will, therefore, have lost some freedom. Great outcome!
In other news: your partner may also leak your most intimate secrets. I hope they do, to teach you a lesson?
Every trust can be betrayed. Why do you believe a world without trust would be better? Only because you cannot handle the nuance of different levels of trust?
There's no evidence of the accused scraper sharing the scraped data with anyone but the account-holder, so the privacy of their friends is still protected.
The state of "opted for not being public" and 'available to any system authenticated person' seem contradictory.
I appreciate that 'system authenticated person' is a smaller set than those who can access anything publicly accessible, and that the former is a subset of the latter.
I agree with the moral argument against posting the scraped data publicly, but if someone gave my account access to their data, I don't think they have a moral right to say I can't use a script to do something private with it.
Scripts are tools, and like any tool they're extensions of the self. If it's morally okay to do it by hand, it's morally okay to do it with a script, so long as my script is respectful of server resources.
Instagram behind a login screen is public. If you say were an OnlyFans model and somebody paid for your videos, scraped them, then there would've been implicit agreement.
Sharing photos on Instagram, there is no such understanding, news outlets have been logging in to view and publish your instagram photos so.
There's no evidence the scraper companies mentioned there are making the scraped data public or sharing it with anyone beyond the individual customer that is already entitled to access that data through the official clients.
As others said, there is no “you” in the scheme. It's Facebook's data. When people access that data without paying, they are “bad guys”. When the very same people pay for it, they are “legal partners”. In both cases they can do anything with it, while Facebook can't be held responsible because of all the official agreements. So as long as there is no specifically bad publicity or money loss anything goes either way.
“You” only exist in numerous empty statements about “privacy”, “respect”, etc. If you are feeling artsy, you can make that hyped NFT thing out of those, and see whether those kilobytes of text really worth anything.
What you are claiming here is not true in Europe. If FB hold data about you, the data is still your legal right. You can have it deleted and changed if it is somehow untrue and have variou other rights too.
There is a relationship involved because ultimately as a FB user, if I don't like what they are doing, I can ask them to remove my data permanently and they must legally do that. If someone has "scraped" that data (if it is considered PID), without my permission or a legal basis to do so, they are in breach of the GDPR and can have enforcement taken against them.
I think some of these "aggregation" businesses will fall foul of this in Europe but I don't know what will realistically happen if that business does not exist in Europe and breaches the GDPR.
This is how it works in press releases. The problem is that data protection laws were in fact lobbied by corporations either openly or behind the scenes, and focus on things like real names and passport numbers that look impressive but aren't really important for the data market. These are just put into some high security database (e.g. for billing info), and it's fine. However, the real behavioral data that costs money is shared as easy as it ever was in the form of “User ID <long number> was at the location of Wi-Fi AP ID <another long number>”. It doesn't matter that the data owner still trades all the history of activity of a certain individual, or that Wi-Fi station locations can be matched with some external database. Everything is fine as long as you don't slap someone's real name on that. And, contrary to the show social networks make, they couldn't care less about real names. Even if you trick the system by calling yourself John Doe, you still look at the specific content, and have specific contacts, you are you, and the data is the same.
I remember that about a decade ago some IT guys have paid for the common Facebook advertiser access, then targeted the ad campaigns using filters in such a way that their intersection only resulted in a single user, or just a couple of them, and were able to match those “anonymized” accounts to real ones. You didn't have to be a genius to do that. Facebook certainly knew it could be used like that. Everyone who made money on that simply agreed to use “anonymization” as a smokescreen. Later, with all the scandals, those routine operations were presented as something exceptional done by a small number of bad actors.
Facebook breaches the GDPR all the time and manages to stay in business. GDPR enforcement is barely existent, and when it does happen, it's insufficient.
“This industry makes scraping available to individuals and companies that otherwise would not have the capabilities.” - seems like web scraping companies are doing a good job :)
Maybe some irony here as IIRC Facebook started as essentially a scraping company, pulling student profiles from college websites and re-publishing it for their own profit.
The scrapers have become the scrapees. The horror.
>Octopus, a US subsidiary of a Chinese national high-tech enterprise, built a cloud-based platform designed to provide paying customers access to on-demand scraping software and services.
It is interesting as how they try to position this as a Chinese attack on them.
It must coincide with Christopher Wray's sudden claim that there is an active dragnet of sorts that is trying to subvert America from within much like the recent election interference of a former Tianmen square activist who tried to run for congress I think.
It makes me think that there are many people on CCP's dole, rich powerful famous people are somehow beholden to the CCP in some unknown way but we can all guess correctly that they are all old white men who have previously been seen with young females.
People that are criticizing this probably were also critical of the Cambridge Analytica scandal, but it would be useful to compare what happened there and here.
With Cambridge Analytica:
- Facebook allowed users (with informed consent) to allow external developers to access their data and limited data about their friends, in order to build social-enabled apps.
- CA exploited this to scrape basic profile data from a large number of users. It broke the ToS by doing so (in particular by using the data for purposes different than stated)
Here the same is happening:
- people are giving a third company access to their profile, which includes access to friends' data (in fact a lot more than what the app platform allowed to do)
- the company is scraping all the data.
At the time of CA, the criticism was that Facebook didn't do enough to enforce its ToS (or maybe that the data sharing should have not been allowed in the first place? But the terms were common knowledge and the attack potential became clear only in hindsight), here people are criticizing that Facebook is in fact enforcing its ToS.
Also note that strong enforcement against scraping is one of the mandates that came from the FTC settlement.
It seems inevitable that any news about Facebook/Meta is read in the worst possible light these days, even when the criticism is self-contradictory. I would expect less superficial commentary from HN.
The real reason most people were upset about Cambridge Analytica was it revealed to the public how advertising and PR companies manipulate us. The fact they violated facebook ToS is moreso the excuse for the press covering it when they wanted to write another anti-Trump piece. If you were accusing a specific newspaper of hypocrisy based on two article I might agree. But you're referring to general public sentiment, and I really don't think most people cared or were surprised about the data collection. The shock and scandal was the realization that targeted advertising campaigns and information bubbles have the potential to sway elections.
I'm referring to the HN crowd, I'm not sure that can be equated to "general public sentiment".
I agree with your first paragraph, and my point is that it is not possible to argue at the same time that Facebook should share data more broadly and allow scraping, and at the same time be critical that Facebook allowed CA to happen in the first place.
If the CA scandal was a wake-up call, it appears it was not internalized enough for people to understand the implications of what they're suggesting in this thread?
In the early days of FB, they convinced people that pages (or some content, sorry I do not know the FB terms) could be public for anyone to view without needing to login to FB. This was very helpful for small businesses and communities. In many countries this is still the quickest place to make a public page. Though now, every small business or community page I want to visit is locked out unless I login FB. Even if I do login it is impossible to copy paste the important details of a page or post, plus the UI is as ugly as it has always been.
I am currently in the USA and when I visit a public FB page e.g. [1], there is a small login header, and a very big annoying footer login. I estimate 15% of the content is blocked. I had spent the past year outside USA until one month ago. When I visited the same sites while traveling outside the USA, the annoying login footer moves to the middle of the page blocking almost all content. I do not have proof at the moment, but that was my experience trying to read 95% of government, business, and community pages who are almost all on FB.
This is different from LinkedIn v HiQ because HiQ was only scraping publicly available data that was generally accessible to the broader internet. In these two cases, the data is being scraped from FB/Insta using credentials that the client handed over or the mass creation of accounts solely for scraping purposes.
> the mass creation of accounts solely for scraping purposes.
Those accounts wouldn't be allowed to view private data though unless they friend/follow the person first, so they'll only still be limited to data the account holders intend to be public and available to anyone.
There's also no evidence that the scraped data was aggregated at scale or commingled in any way, so even if customers provided their actual credentials which grant them access to private data of their friends, the scraper didn't share it with anyone else but them.
Did FB ever take any legal action against Cambridge Analytica? I can't remember anything about it and this sounds very similar to that (although back in those days FB's tools made this incredibly easy).
I wish the Cambridge Analytica FUD would stop. CA's "attack" was to setup a malicious website that convinced idiots to give it access to their Facebook account using the standard oAuth2 flow.
Did they misuse the collected data? Sure. But people granted access to that data knowingly. This wasn't really an attack in my view.
Facebook wasn’t really complicit and definitely didn’t sell/give away any data.
> After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus
"self-compromised" lol
clearly these people just wanted an automated way to access their own data
> clearly these people just wanted an automated way to access their own data
GDPR and CCPA (and probably many other national/state privacy laws) forces facebook/instagram/etc to let you download and/or delete your data without using third party websites. Usually people self-compromise their accounts in exchange for money: https://www.buzzfeednews.com/article/craigsilverman/facebook...
Ironically, around a year ago I disclosed (using their White Hat bug bounty program) that I'm able to access recruitment data (candidates details mostly) using very cheap form of scraping against a 3rd party service provider, they dismissed it and instructed me to report it to the 3rd party that operates that service (which I did beforehand but the issue has had not been fixed).
Sorry for being vague here, I haven't publicly disclosed it yet, but will probably have to if it don't get fixed.
Funny story from the early days of TheFaceBook, probably around 2005ish:
I was a webmaster of a set of servers on a major university's network. I also had access (enough to run arbitrary programs that had pretty much full ingress/egress to the public internet) to a number of machines across the campus's network. Through some of my coursework and ACM chapter activities I met some other similarly minded technical people with similar levels of access.
We decide that it would be fun to use our superpowers (access + programming abilities + curiosity) to sign up for various accounts on FB and essentially scrape and friend as much as possible. At the time they had some rate limiting, some IP banning (which wasn't terrible because the Uni gave public IPv4 addrs to all machines on campus by default) and then added some early CAPTCHA which we ended up breaking pretty trivially with some python and image recognition code.
Never got sued... :) Never really did much with the scripts or data except test that they worked. Fun times.
Let's be clear and accurate: technically weev was put in jail for conspiring on IRC with JacksonBrown. JacksonBrown was the one who wrote a PHP script that incremented a value in a URL (and appended a valid Luhn check digit following incrementation).
Conspiracy to access a protected computer system - that is, typing on IRC. weev didn't write any of the code or access the API.
From the article: "[T]he Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act."
The key phrase is "publicly accessible." This wasn't that. The scraping was done by automating Facebook accounts, which have terms of service, which forbid scraping.
ToS/EULAs make a big difference. They're the reason Blizzard could shut down bnetd's StarCraft server. They're why no one can legally reverse engineer Oracle to create a drop-in replacement, despite interoperability provisions.
More and more platforms are putting the majority of your user-generated content behind auth walls with ToS because that's how they prevent competitors from swiping it.
> ToS/EULAs make a big difference. They're the reason Blizzard could shut down bnetd's StarCraft server. They're why no one can legally reverse engineer Oracle to create a drop-in replacement, despite interoperability provisions.
Strictly referencing EULAs for user-owned copies of software here, not ToS:
That is not true. The Blizzard court clearly erred in not considering unconscionability when analyzing the EULA. As for Oracle, the interoperability provisions are what overrides that part of the EULA.
Does it go into detail about the actual meaning of "publicly accessible"? Because most content on Facebook/Instagram requires any valid login (as opposed to a specific account) and that data people intend to be public (especially on Insta).
In this case, the account requirement would be a technicality and the data, for all intents and purposes, would still be considered "publicly accessible" if anyone with an account can access it.
Putting a login screen that any public member can bypass isn't private information. Private info would be Onlyfans videos. So far there is no such feature on Instagram
So much bad faith in this press release but not surprising from such a disgusting company, with of course some China-related fear-mongering despite no evidence of wrongdoing.
> After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus.
They didn't "self-compromise" their account. They trust Octopus to act on their behalf, and unlike Facebook, Octopus' interests are most likely more aligned with their users' since their service is paid. This is no different from handing your Facebook credentials to your social media manager or secretary. There's no evidence that Octopus misused this access in any way.
> Octopus designed the software to scrape data accessible to the user when logged into their accounts, including data about their Facebook Friends such as email address, phone number, gender and date of birth, as well as Instagram followers and engagement information such as name, user profile URL, location and number of likes and comments per post.
This is either information people intend to be public or information they trust their friends to keep private. Now if Octopus was leaking the private information to third-parties it would be one thing, but so far I see no evidence Octopus was disclosing the scraped information to anyone but their customer (who is already authorized to access it).
> Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services
Translation: Meta is an industry leader in protecting its disgusting business model that hinges on making public data behind a walled garden with an unacceptable "privacy" policy. There wouldn't be a market for Octopus (or other scrapers) if Facebook already allowed customers to efficiently access information they're already entitled to, but that would be against their interests as their entire business hinges on information being held hostage.
They've created a problem, are selling the cure (well in this case monetizing it via ads) and are now pissed off that someone else is selling the cure for cheaper.
Anyone else heard of Tim Berners-Lee's idea of hosting your data in pods outside the relevant corps wanting access to it and you controlling what's shared and how?
This is such a completely different way of doing it, I'm not sure of all the implications, be that from admin (how much effort) to security (would this be a massive hacking opportunity) etc.
https://www.theregister.com/2022/01/20/tim_bernerslee/
I'm torn on Web scraping because the extreme of each end of the spectrum on this issue both seem unreasonable.
On one side, you have people who say any form of scraping is be disallowed, even prosecutable. This went so far that the Department of Justice on behalf of AT&T prosecuted a case of URL modification [1]. One of the few bright spots for this psychotic Supreme Court was to curtail the government's power under the CFAA by limiting what constituted "unauthorized" access [2].
On the other hand, there are those who think that any level of scraping should be fine and I think that's untenable too. Consider Yahoo indexing of Stack Overflow [3]:
> In the meantime, since Yahoo (via Slurp!) is about 0.3% of our traffic, but insists on rudely consuming a huge chunk of our prime-time bandwidth, they’re getting IP banned and blocked.
Do these "scraping extremists" think such actions should be illegal? It's actually not that far-fetched given the Ninth Circuit decided LinkedIn wrongly blocked HiQ scraping [4]. Like if you change your website with the intent that it'll make scraping more difficult, is that a problem? What if it's an unintended side effect?
Additionally, companies like Meta, Google and Apple are going to be way more acountable to abiding by data retention laws and regulations than any scraper. If it's OK to scrape FB.com completely, that information is out there forever.
I certainly think the government shouldn't prosecute on behalf of companies. At least that should expose to people how the government's #1 priority is in fact to protect the true constituents: corporations and the capital-owning class.
> So much about this case is ridiculous, and it’s complicated by the fact that nearly everyone agrees that weev is a world-class jerk. But, you need to separate that out from the details of what he did here, to note that it was nothing particularly special, and it involved the sort of thing that security researchers do all the time, and which all sorts of non-security researchers do quite often.
Yeah... uhm... I used to do exactly this sort of thing...
When I was a teenager, I would look at the URL of whatever site I was on, and would change a number here, or a letter there; and see what I got.
Sometimes you get nothing, sometimes you get something. Sometimes that something is quite interesting.
> Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services, which provide scraping as a service across multiple websites.
Sure, as long as Meta is not the one selling the data to Cambridge Analytica it's wrong.
HN is hypocritical - most commenters here are against this because "Meta bad," but at the same time, most commenters wouldn't want their posts shared privately amongst friends to be scraped and made available publicly.
> most commenters wouldn't want their posts shared privately amongst friends to be scraped and made available publicly.
Where's the "posts shared privately amongst friends made public" part? There are two cases here:
1. A service that logs in as the customer (who voluntarily provide their credentials) and scrapes information visible to said customer on their behalf. Nothing about "made available publicly" is alleged.
2. An individual using a pool of bot accounts to scrape posts visible to any logged in user. Nothing about "shared privately" is alleged. To be clear I don't like the method, but I'll also have to admit I've used one of the Instagram "clone sites" in the past thanks to their login wall.
Unless I missed something, it sounds like you just made it up.
For that to happen, one of your friends would have had to willingly allow this tool to scrape their social network, which would include your private posts.
As many other people, you are calling something “private” when it is not.
“Privately shared with friends” used to mean that only you and your friends know something. You don't “share” anything with “friends” on a social network. You give the information to a giant corporation. If it finds it suitable, it then delivers it to other users, but only after it records your location, analyzes the content to check if you were, say, affected by some melodramatic event (and therefore should be tricked into spending more time… I mean, get “personal recommendations” for a certain kind of content), and does a billion other things.
If you consider that this is fine, please relay all your conversations with family and friends through me from now on. I offer secure, reliable, fast, yada yada communication service. And it's hip! Ask anyone on the street what they use.
There are two cases they brought up, one being web scraping and the other is making a clone website publicly displaying content from Instagram.
I think Meta might be mixing up these two cases here on purpose to make it look like web scraping is as bad as stealing photos to publish it on a clone website.
Octopus sounds really useful; is there an open source equivalent? I'd love to be able to scrape my own data on Facebook. Their data export feature is fairly good but far from complete.
Google has turned Google Search into a walled garden by scraping people's content and serving it up on their own platter. Is anyone going to stand up to them?
Evil Big Co. that literally STEALS people's personal information everywhere they go even after they've indicated they want to be left alone is now offended when someone does the same to them?
Well, color me surprised /s
Fuck Facebook. Meta. Or whatever you want to call it.
Is this actually private data, or is it public stuff that's become annoyingly hard to view anonymously because Meta chose to stick it behind a login box?
Depends if another user can also access it, or whether the original author/owner of the data in question intends for it to be public. In Facebook's case, there are permission levels you can set on posts, including a "public" option (which isn't actually public though and will require a login anyway, but it can be any login) which would settle that debate quickly - hell I wouldn't be surprised if that option were to be hidden as to not acknowledge that a particular bit of data was explicitly posted for everyone to see.
> In Facebook's case, there are permission levels you can set on posts, including a "public" option (which isn't actually public though and will require a login anyway, but it can be any login)
Q: Have you tried this?
In a private browser session I started at google.com, searched for "site:facebook.com nextgrid", picked some random post, click through, and was reading the post without anything other than seeing FB's cookie banner. No sign of any login (which is good 'cause I don't have one)
but you make it public for everybody with the publicly accessible login so it wouldn't be considered private data for the same reason news outlets can use your instagram images and share it widely without your permission.
you can't throw up a login screen but then allow people to post themselves that ends up in public domain because the login does not distinguish from public or permissioned user authorized to view your selfie pics.
From GDPR point-of-view this kind of 3rd party data collection is not acceptable (assuming it covers personal information, for example names of people and what they have posted). The difference with Meta's own data collection is that the users have relationship with Meta and users have given their permission for Meta to handle the data. Users also know they can contact Meta and ask them to remove the data.
3rd parties don't have the consent from users. Users don't even have an idea these companies might be holding their data.
From a GDPR point of view the scraper would be acting as a data processor on behalf of their customer, no different from using a cloud storage service for your contacts. It's fine as long as the third-party doesn't misuse the scraped data or share it with third-parties and there's no evidence they did so in this case.
> and there's no evidence they did so in this case.
Indeed; the users probably wanted to make the data public, if scraper accounts could see it. There is a GDPR allowance for data "manifestly made public by the data subject".
"scraping attacks"
Scraping is not an attack. Monopolists want to pretend they own your data because they get unlimited access to monetize it whereas competitors should have none.
"self-compromised"
Monopolists want to sell you thus it's imperative they maintain the fiction of "one person, one account". By admitting you own your account, they'd have to allow sharing and they wouldn't be able to provide their customers (advertisers) with reliable data about individuals.
"protect people from scraping"
Monopolists will protect themselves and call it protecting you. They will attempt to make you afraid of some other actor using your data in harmful ways so as to detract from how they monetize you and use your data in harmful ways.
"deter the abuse"
Monopolists don't want to argue about what constitutes abuse. Anything they write in their TOS is entirely for their benefit and only constrained by local law (if that). They will abuse you to the fullest extent they can get away with while arguing that any action to use your rights is "abuse."
"safeguard people against clone sites"
Monopolists want to maintain their monopoly, there is no greater threat than a direct challenge to that monopoly by allowing data to move freely.
--
More subtle but even more ironic rhetorical points
"for hire" / "paying for access"
Emphasizing that people making money (gasp) for providing this service, is bad.
"industry leader in taking legal action" + "across many platforms and national boundaries, also requires a collective effort from platforms, policymakers and civil society"
Monopolists can pay high priced marketers to rebrand them as patriotic hero figures fighting valiantly for the little guy.