Hacker News new | ask | show | jobs
by nathanlied 2335 days ago
> Lots of sites have ToS preventing such things, are those legally void now? Are captchas on public pages illegal, even if you request the page 8000 times in a second?

ToS are subservient to the law; you can (probably) terminate a service account from a user that breaks your ToS, but if the user does not have a service account (as is the case for HiQ, it doesn't seem they were using accounts for it), then your ToS does not apply, since you've technically not entered a binding legal contract with them.

> This is almost weirder. If LinkedIn wanted to force users to sign in to view profile info, would they be not allowed to do that because some company had signed a contract that implicitly assumed access to that data? If someone writes a web scraper for my site, and I unknowingly change my site in a way that breaks that scraper, can a court force me to revert the change?

IANAL, but I believe that'd fall on intent, and intent is often difficult to prove at a personal level, but not necessarily at a company level. If your intent for putting up barriers that happen to impact scraping, whatever they may be, was indeed to knowingly prevent scraping from a particular company, then you may be liable under this decision. This is the only part of the decision I'm torn on, since it's a bit messy to really prove such things. I'd be much more comfortable with allowing companies to take whatever measures they feel necessary to prevent scraping, and also allowing scrapers to legally circumvent those measures without threat of prosecution, assuming they didn't actually hack into anything.

3 comments

> but if the user does not have a service account (as is the case for HiQ, it doesn't seem they were using accounts for it), then your ToS does not apply, since you've technically not entered a binding legal contract with them.

Are you sure about this? I am not a lawyer, but I believe that the Terms of Service applies to all users, not just those that explicitly set up a user account.

I have interpreted the LinkedIn ruling to mean that scraping public data is no longer criminal activity but it still leaves you open to civil lawsuits for violating the ToS of the website you are scraping.

> Are you sure about this? I am not a lawyer, but I believe that the Terms of Service applies to all users, not just those that explicitly set up a user account.

How would that even work? If I browse to any random public page of your website, it's served to me before you've even transmitted the terms of service. How could I be bound by those terms of service when I haven't even seen them?

As an engineer, I agree with what you are saying, but I think normal people and the courts disagree.

I think these sorts of contracts are called Adhesion Contracts (https://www.investopedia.com/terms/a/adhesion-contract.asp) and we interact with them all the time. For example, if you valet your car, the valet will hand you a piece of paper with a number printed on it to retrieve your car. On that paper you will find an adhesion contract that is valid and real (although not as powerful as the types of contracts that you sign)

This does not work at least for software licensing based on precedents for shrink-wrap contracts, so again would not work for licensing use of data.

A paper served you by the valet is not an immediate contract as you can deny agreeing to it and service does not happen.

You cannot do that with a publicly visible website, unless you show ToS and require agreement before first use. If you allow a non-transferable license then said data cannot be used by a search engine. If it's transferable you just pushed the problem towards scraping a different bot. (Well, you could have a direct agreement with a few major search engines.)

Caveat emptor: not a lawyer.

IANAL, but it seems like ToS could still govern your use of the data which you viewed. Sure, it seems like you couldn't claim any violation based on visiting a random page. But if the ToS is clearly identified on the page and you do something with the data that violates them, perhaps the owner of the site has a case.
> perhaps the owner of the site has a case.

Except it sounds like the owner doesn't. If the information is on the page made public, the owner of the page can't place terms on what is done with the data downstream. They'd have to implement some real binding system such as authentication where CFAA would apply. (IANAL)

Correct, but all of that is void if the data presented is any sort of protected information (copyright, IP, etc.). You can't, for example, scrape Yahoo Finance for pricing and dividend history and republish on your own stock tools website. They have a license to redistribute that data and publish on their own website. Similar story for copyrighted text and things of that nature.
That would require at least showing that ToS on first use. A link on a page is insufficient.

And said ToS would have to force copyright reassignment rather than a general licence, making LinkedIn culpable for any unlawful content published by users of its site.

I am a lawyer, and there isn't really an easy answer to these questions.

TOS are a lot like EULAs. If they look like contracts of adhesion, then they're going to get more scrutiny and skepticism. The TOS that you claim applies even to every single random visitor to your site where they do not in fact affirmatively agree to the terms is potentially going to look more like a contract of adhesion. That's a lot harder to enforce.

If they are used more for CYA so that you can ban undesirable accounts from your website which people explicitly agreed to when they signed up for it, or so that you can just up and alter your entire business model without having to give all of your customers refunds, then they're easier to defend.

Just my general opinion, of course. Every jurisdiction is different.

Also not a lawyer, but you cannot force me to accept your terms of service. Contract law requires both parties agree to enter it.

When you create an account, etc., you are agreeing to those terms. If I browse a public webpage that just has a terms of service link on the bottom of it, I've not agreed to anything.

> Are you sure about this? I am not a lawyer, but I believe that the Terms of Service applies to all users, not just those that explicitly set up a user account.

Typically you'll see TOS say something along the lines of "by continuing to access this site you agree..." or "if you do not agree with these terms you may not access this site..."

Whether that's enough to create a binding contract depends on the jurisdiction and who you ask.

It can also depend on the terms themselves. I can put "by using this site you agree to bake me a chocolate cake" on my website all day, but that doesn't mean I will be able to force you to bake me a chocolate cake.
Terms of Service is a form of contractual agreement, which requires there be an offer and subsequent agreement by the parties.

I don't think criminal law was ever part of this.

From the article, the LinkedIn decision was that scraping data does not violate the Computer Fraud and Abuse Act. Violating that act was considered to be criminal activity. (https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act)
But the claim of a violation was only a claim as part of a civil trial. The law has both civil and criminal elements to it, and this is about the tort part of the law.

LinkedIn made threats accusing hiq of criminal behavior, but that doesn't mean there's any criminal precedent being set here, as far as I can tell. And no one was criminally charged.

Separately, part of the ruling states that for the purposes of authorization, defying a cease and desist letter does not constitute illegal access, which might have some criminal implications. They imply some sort of technical authorization system must be bypassed, which didn't happen, since the data is "public."

(Which doesn't square well, imho, with existing meatspace law. If a public serving business banned someone from their store, the door being unlocked isn't an excuse to ignore that ban and trespass. But I digress.)

With the overlapping areas of law, it's admittedly beyond my understanding. But the law is generally viewed, like dmca, as being overreaching, if not at least partly unconstitutional.

The CFAA is overreaching, and used often as a catch all. 'Reply All' has a good episode which explores this. This is actually what was used against Aaron Swartz when he was charged for downloading academic journals from MIT, and why his charges were unjustly severe.

Reply All - #43 The Law That Sticks https://gimletmedia.com/shows/reply-all/rnhoxb

It doesn't completely answer your question, but what Nathan is pointing out is that private contracts cannot negate common law.
There's a long, long history (probably hundreds –if not thousands– of years old) of selling aggregated or processed publicly-available information.

I'm not particularly thrilled with it, but enough people think of it as a valuable enough service to pay for; even if they know they could get it themselves, for free.

LinkedIn users (as opposed to the company) might actually like what HiQ is doing, as it may help their own prospects.

> but enough people think of it as a valuable enough service to pay for; even if they know they could get it themselves, for free.

It's not free, it takes time to collect data. Buying it makes a lot of sense as long as you pay less than what's your own time worth to you...

It is true in the current situation, though I would prefer that we ensure free data must be free. In that case buyers of data would be incentivized to pressure providers of free data to improve the data quality.
The data does remain free, as long as LinkedIn still provides it for free.

The data without the noise is what you're paying for. The service of winnowing out what you care about from what you don't care about.

Considering how big of an effort it is, and that the source from which it came is still available, why should the cleaned data be free? If I collect fallen trees from public land, chop it into usable firewood, should my bundles of firewood also be free? Or I collect solar power with my own solar cells, should I have to give you the electricity for free?

I think this is especially relevant when it comes to things that fall under disclosure & transparency requirements - a lot of information that is legally required to be made available isn't legally required to be convenient. So, as a patient, you may have the absolute right[1] to a free copy of the charge master[2] of a hospital you're admitted to but it could be required that you pick it up in person or that it is only supplied in microfliche form... so a company that's aggregated this and is reselling it can deliver real value.

1. This specific example is BS but plausible - I just wanted something more specific than the vagaries around things like FOIAs or shareholder reports which both have specific facts that can be rendered useless unless you have the context.

2. Basically, list of how much procedures cost.

Absolutely spot-on.

I'm thinking of processed GIS data. If you have ever tried using the various formats that are supplied by government sites, you know what a huge pain it is.

I'm happy to pay a reasonable price for an interpreted and bowdlerized version.

I actually have! I had to import a huge file of all of the culverts around storm drains in a state, and each culvert was multiple pieces of geometry, none of them grouped together in any logical way. It was just a huge list of rectangles that looked like culverts when viewed visually but no way to identify them as being one culvert without heuristics on how close each rectangle was to others. Massively long process that should not have been so.
What do you mean free data must be free?

The data is free, but the aggregated formatted data has been worked on and processed, are you saying the resulting aggregated data should also be free? That isn't going to happen, why would anyone do that work for free?

Or are you afraid linkedIn and others will make everything private? That's completely up to linkedIn or individual linkedIn users what they want to make private vs public. Maybe more data would be made private if they don't want it scraped. I don't think that's inherently a good or bad thing.

I'm trying to puzzle out how this works in practice. So if LinkedIn has truly public data (no login required to view) then it can be scraped no problem.

But if it's only accessible with a login, then it falls under TOS and they can be blocked?