Hacker News new | ask | show | jobs
by KieranMac 1614 days ago
As a lawyer whose primary focus is in web scraping, this article is in many ways misleading and inaccurate. While it is true that the Van Buren case is generally positive for web scraping, the overall legal landscape is still murky. The main battleground for web scraping legal issues is shifting from the CFAA to breach of contract and various state-law issues, including misappropriation, unjust enrichment, and trespass to chattels.

In my opinion, 2021 was a bad year for the law as it relates to web scraping. The Supreme Court remanded hiQ Labs, and many high-profile lower-court cases ended badly for web scrapers. It's a darker shade of gray than it was in 2020. It can be navigated, but it's tricky.

7 comments

Not a lawyer, but is it at least true that web scraping alone would now be significantly less likely to be a basis for federal criminal prosecution under the CFAA?

I'm often reminded of the fact that in https://en.wikipedia.org/wiki/United_States_v._Swartz the scraped party JSTOR did not desire to press civil charges, but due to the criminal component of the CFAA, this was out of their hands - and the story ended in the worst possible way.

If the current legal landscape at least better restricts disputes over web scraping to civil litigation, it may not be a huge change for how companies look at their risks, but it could make a huge difference for individuals caught in the crossfire.

Yes, I would agree with that first sentence. After Van Buren, web scraping alone would now be significantly less likely to be a basis for federal criminal prosecution under the CFAA.
Good take, IMO ethically speaking we should not penalize scrapers themselves but do so based on their use.

Scraping Facebook to make a clone of profiles shouldn’t be held to the same scrutiny of scraping Facebook to do an internal analysis of user demographics for research purposes.

Why should either be discouraged?
Cloning profiles is what seems likely wrong to me but I'm not sure how that being done via scraping or not should matter.
With cloned profiles (or any data obtained and shared without your consent) it will be harder for you to exercise your right to be forgotten, for example.
How many contracts google breaches scraping billions of pages every month?
Google doesn't have to proactively try very hard to ingest sites. If something is difficult for Google to scrape they don't sped loads of engineer hours on getting it to work. They just leave the site out and the webmaster there will quickly bend over backwards to make sure Google can scrape them. When something gets scraped into Google inadvertently it's because the website made not even the slightest effort to protect itself.
Given the nuances of browsewrap contract enforceability, perhaps not as many as you suggest. The tricky part with navigating this gray area is knowing the likely circumstances when a contract of adhesion may give rise to an actual legal claim. There are patterns.
So in the scale of google, 'not many' would be some few million per month? And all is good then, right? Even you use their scrapped data probably daily and are totally fine with that, right?

You think google bots read contracts before scraping website? really? :) If you had any experience in creating websites and launching them online, you would know how fast and often they arrive and how they do not care about your TOS. So the real 'violation' numbers might be very scary...for you.

https://ironcladapp.com/journal/contract-management/are-brow...

They read robots.txt right? You can easily add a Disallow rule for google-bot there
What if the scraping occurs as part of web crawling?

Suppose I point a scraper at site S1, which has terms of service that say scraping them is OK, and my scraper finds a link on S1 to S2 and follows that, and follows a link from S2 to S3, and so on.

At some site Sn far enough down that chain is it really possible to use the scraper accessing that site to infer my intent to accept Sn's contract? The connection between me and Sn seems tenuous enough that it might be hard to even argue that I intended to visit Sn, let alone use that to infer acceptance of their contract.

Interesting!...I'm not a lawyer, so the content for this piece was based on commentary in the below article. Was written by their lawyer, but would love to hear your counter point to it. Always good to get multiple viewpoints on something.

https://www.zyte.com/blog/van-buren-a-victory-for-web-scrape...

The Zyte article isn't inaccurate; it's just a simplified assessment of a complicated issue. If you'd like a more nuanced perspective on this, please read my guest post of Prof. Goldman's blog.

https://blog.ericgoldman.org/archives/2021/06/more-perspecti...

Is there a good blog or something that tracks these cases?
Prof. Eric Goldman's blog is probably the #1 site historically on scraping and the law. I've contributed to it a few times.

https://blog.ericgoldman.org/archives/2021/06/more-perspecti...

The name of my firm is McCarthy Garber Law. I write about scraping there when I have time (which I rarely do)!

I agree that Eric's blog is great for getting updates on what's going on, and I've been following it for years. But he is very one-sided in his opinions about decisions, particularly on controversial issues like section 230. I have to remind myself he's an academic (though at a law school) and I'm not just reading some defense firm's memos.
Eric is brilliant, and he has an encyclopedic knowledge of internet law. He's also an incredibly kind, generous, and open-minded person. That said, I will refrain from any commentary on Section 230, as I have zero expertise on that issue!
Enjoyed reading your bio on your website. Sub 24 hour at Leadville is super impressive! (Coming from someone who has not managed 24 hours at Western States... Yet...)
Leadville is just 45 minutes up the road for me, so I'm kind of cheating!
Is there a good blog post or summary that I could read?