Don’t most websites prohibit someone from scraping and harvesting the data on them? Most recently, I can think of Yelp, Amazon, and GitHub prohibiting this, as well as the Aaron Swartz case.
Up until June 14th of this year, the ruling was that scraping is legal from the HiQ vs LinkedIn lawsuit[0].
While finding that link for you, I learned that since that date, the Supreme Court vacated the decision back to the lower courts in light of a new decision of theirs.[1]
Now I don't think it's clear one way or the other just yet. Any lawyers here with an opinion on how this is going to go? I haven't found any analysis.
Well given all this and the consent decree, I'm kind of seeing why FB made this decision. If it's legal, let the FTC say so explicitly.
It's probably very bad strategy to allow the privacy leak, and then hope that the FTC agrees with your decision later. No one will be sanctioned for adhering too strictly to consent decrees, but you could be sanctioned for being too loose. So the choice is obvious in that light.
It will be interesting to see how the FTC will rectify its ongoing fight to hold FB more accountable for protecting its users’ private data with the FTC’s ostensibly contradictory position to allow mass scraping of private user data at scale in this case. Cambridge Analytica was doing roughly the exact same thing, which was precisely what motivated the FTC's involvement in the first place.
“Prohibit” is an interesting word that in the world of public web publishing of data is actually pretty meaningless. Certainly companies take measures to prevent bulk scraping when they can detect it and have legal remedies if copyrighted or owned material is republished. But simply telling me I am prohibited to hold on to a copy of your site’s data is pretty meaningless when my browser caches your page and you want my browser to cache it to speed your site up.
By accessing the data on a website, you are forming a contract with that website and are subject to the website’s terms and conditions.
As a website owner, beyond that, there is no need to tell someone they cannot access your site if you simply block them from accessing your servers instead using a multitude of techniques.
Where does caching come into play at all here? You cannot cache content to begin with if the server is blocking access in the first place. And if you have already cached it in the act of violating said website’s terms of service, then you are still not in compliance.
I'm pretty sure that first paragraph is false. Until I explicitly agree to a contract, or "terms and conditions", I am not bound by anything. If a site I navigate to embeds content from another website I am not immediately bound by the terms and conditions of that neclsred site, to think otherwise would invite madness.
Not to mention the fact that terms and conditions are not contracts. I don't think they carry the same weight, although someone please correct me on this if I am incorrect.
Plaintiff: “Judge, when the defendant used my public website, a contract with me was implicitly made”
Judge: “what defendant? There is no one here.”
Plaintiff: “Oh he was anonymous, so I am not sure who it was…”
Judge: “Hmm interesting, so you seem to think an implicit contract exists that you want to enforce with no documentation at all with a party you can’t name, because you are not sure who it is?”
Plaintiff: “Exactly.”
Judge: “Feel free to come back when you aren’t going to waste the court’s time”
This clear instance of reductio ad absurdum is wholly non-analogous. Factually inaccurate court proceeding depictions and legal misunderstandings aside, for one thing, the non-hypothetical defendant’s identity is well-known in this specific instance.
Interesting. I would have thought that someone with the ability to craft such a sesquipedalian response would also have been capable of understanding irony.
According to many rulings the last few years, continually and systematically accessing a third party’s data under the clear expectation that you are aware of (as well as agreed to) their terms is definitely a meeting of minds.
I think you missed the word “public” on my comment. If you are posting content for public consumption, unless that content is copyrighted by you AND I republish/sell/etc it. You basically don’t have much to say if I choose to keep a copy of it.
If you don’t want me to have a copy of it for any reason, don’t let me have it at all.
Is it public though? For one thing, you need to have a registered user account and be part of the targeting audience as a winning biddee for ad placement in order to see the ads in question. As an ordinary person, unless you share your private login credentials with me (which would be another ToS violation), I cannot view nor access the ads you have been shown.
You seem to be hung up on users and passwords, I am talking anonymous public accessible sites.
If you publish content on a website that willingly provides data to anonymous users of your site, even with a TOC on the site, the TOC is not enforceable if you cannot prove that the user explicitly agreed to the TOC. If you don’t know who the user is, you can’t prove that they agreed to your TOC.
Having a TOC is basically legal theater if you allow anonymous users. The implied threat is basically “IF we find out who you are” and you use the site in a way that is contrary to our published TOC, we will take action against you.
Your only recourse in that case is to pursue sites that are republishing your copyrighted content…because only at that point can you actually identify the party that may be misusing your site and it’s content.
In reply to your sibling comment: I actually agree with you in the anonymous case and point out that many of the examples named (including FB, the topic of discussion) require user authentication.
They can't throw you in jail over it, but it's within their rights to stop sending you these bytes or kick you off their platform altogether.
If they'd try to prevent you from scraping third-party sites it would be making laws; setting up ground rules with their ToS and enforcing them is absolutely fine.
> but it's within their rights to stop sending you these bytes or kick you off their platform altogether.
Actually no.
If a platform provides a generally available service they are (in many countries, idk. about the US) not allowed to arbitrary exclude some people they don't like without a legal valid reason.
And braking legally not valid/binding terms in a ToS is not a legal valid reason. Just because you write something in your ToS doesn't mean it has any legal relevant meaning, there are limits to what you can put in ToS. And limiting (properly done, privacy respecting) research is often not valid. (Through depends a lot on the country.)
It’s within anyone’s rights as website maintainers to block malicious IP addresses that scrape or otherwise within their discretion.
Nobody is legally forcing websites to allow access to everyone, and accordingly, nobody is altering the law by blocking access to people (crawlers, hackers, spammers, malcontents, or anybody really) that they feel are not welcome. So exercising one’s existing rights isn’t an act of making or altering laws.
I suggest reading up on what robots.txt is to further understand this.
While finding that link for you, I learned that since that date, the Supreme Court vacated the decision back to the lower courts in light of a new decision of theirs.[1]
Now I don't think it's clear one way or the other just yet. Any lawyers here with an opinion on how this is going to go? I haven't found any analysis.
[0] https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn
[1] https://www.reuters.com/technology/us-supreme-court-revives-...