Hacker News new | ask | show | jobs
by otterley 1567 days ago
> Please don't attempt to equate internet traffic to door locking. It's a tired old argument that fails the moment critical thought is applied.

It's a useful metaphor that gets people convicted. You might not like it or agree with it, but that's the way it is.

> Web scraping is most certainly legal. Everything involved in the ridiculous "breaking and entering an unlocked residential door" is done a billion times a day by web scrapers as a matter of course

Unfortunately you, like others, are ignoring the crucial element of consent. Web scraping is done lawfully only with the consent of the website scraped. When scraping is done non-consensually -- even if the website is public -- it can be considered trespass to chattels and might even constitute a CFAA violation. I know this because my company scraped eBay without their consent in the late 1990s/early 2000s and was shut down by a lawsuit. See, e.g., eBay v. Bidder's Edge, 100 F. Supp. 2d 1058 (N.D. Cal. 2000) (not my specific employer at the time, but in the same business).

Ignore robots.txt at your peril, and treat the absence of one as a lack of consent. That's what Google and other search engines do.

3 comments

I agree that the metaphor has some use, but I think most of these open access cases are more akin to trespassing in the woods at the far end of someone's large property or going through an unmarked door in a public building and finding oneself accidentally in a private space than breaking and entering into someone's home.

That is, if there are no signs posted and you have not received notice that trespass is prohibited you should be given a healthy benefit of the doubt. It is obvious that homes are intended to be private, but not so for files being publicly served on the internet. This whole 'treat the absence of notice as a lack of consent' is a non-starter for me.

No metaphor is a perfect fit for the situation. It's a didactic device, nothing more.

Nevertheless, nobody's getting criminally prosecuted for accidentally fetching a file. Even someone who accidentally downloads child pornography once is unlikely to get in trouble for the mere act itself, provided they delete it as soon as they receive it.

Acts that are getting people in trouble are intentionally downloading files they have no good reason to access, clearly aren't authorized by the owner, and the circumstances surrounding the activity indicate an illicit purpose. All the facts that indicate guilt are going to be argued by an AUSA to a court and possibly a jury; no judge is going to hang someone (metaphorically speaking) for a mere accident.

C'mon, people. Use a little common sense.

"It's a useful metaphor that gets people convicted. You might not like it or agree with it, but that's the way it is."

It's a blatantly false metaphor. Burglary requires intent to commit a crime once inside.

Indeed, and the crime is stealing (unlawfully copying) the data within.

Admittedly it is an imperfect metaphor -- as all metaphors are -- but it is not "blatantly false."

Data is not fair game for the copying just because it's in a place you can reach it with `curl` without having to pass an authorization check. That's not the law, and it's not common sense.

Eh, you're not depriving a person of their property like in the physical world. It would be like trespassing and reading something. Again, a failed metaphor. What would really be common sense is for people to stop trying to fit bad physical metaphors on technology concepts. They don't work and they obscure the real points.

Frankly, tons of stuff is illegal on the internet. You've likely committed felonies by violating a site's terms of service. That's how the DOJ applies the CFAA. It doesn't get enforced, just like that MO reporter didn't get arrested. Should they have been? It was unauthorized access which you claim is enough under law and common sense...

It's my belief that intent alone is not sufficient. Actions speak louder than words. Who cares if you say "no one is allowed to access this" and then leave public access enabled to something? It's common sense that you didn't secure it and you have no expectation of privacy. Look at traditional cell calls and radio. You're putting your information in public and others can view it. DNA you leave on trash can be collected without a warrant - and with no intent/consent on your part!

The law is a mess and full of contradictions. Even when the statutes are sound they become perverted by activist or impartial judges as well as law enforcement or prosecutorial discretion. Rule of law is a joke when individuals have the power to decide not to enforce it.

Also, I believe there was some case law recently that stated that publically exposed or unsecured data can be accessed without it being a crime, but depended on the details. I don't remember the jurisdiction and I can't seem to find it now either. Oh well.

Property rights are about control and exclusionary rights, not about physical things like land and widgets. This is a common misconception among the public and one of the first things they teach you in your first-year property law course.
"This is a common misconception among the public and one of the first things they teach you in your first-year property law course."

Typical lawyer response - I know more than you and I'll give you an answer that looks down on you without addressing the meat of the topic.

"Property rights are about control and exclusionary rights, not about physical things like land and widgets."

I haven't said otherwise. This reinforces my position that these terrible metaphors draw people off topic and do not translate to virtual property the same - the whole reason trespass and computer trespass are separate crimes with separate elements. In fact, I believe that most laws around computer resources have too much influence from traditional laws because the politicians and judges who wrote them relied too heavily on concepts from the physical world due to habit and a lack of understanding of the new concepts around technology and its possibilities.

The real question is whether the laws are appropriate. It's an asymmetrical power dynamic that favors the stated intent of the owner over the stated intent of the user, even ignoring the actions of the owner when they're contrary to their stated intent. Computer trespass and unauthorized access is much more complicated and lacks the protective mechanisms that physical property laws have to protect non-owners. For example, consent and intent to let others use a computer resource is terribly vague. You don't need written permission to visit a website, there aren't clearly posted boundaries with signs stating this or that resource is off limits, etc. Even ToS tend to very poorly define boundaries within a system.

Without clearly defined and posted boundaries as well as a lack of explicit grants or revocation of privileges in publicly accessible cyber spaces, we have created a system that favors the undefined undermining the underlying concepts of strict construction - that laws need to be defined strictly so that they can be applied equally and so that they are knowable to the subjects. In the case of cyber laws, relying on the stated intent of the owner which was not well defined anywhere nor communicated to the user as well as ignoring the intent preceived through the actions of the owner that contradict their stated intent.

What we have is a system that will allow bad laws to stand because of unequal enforcement. Accessing publically available URLs and the data returned can either lead to charges from the FBI against an unknown person, or to widespread support for a reporter. Prosecutorial and law enforcement discretion means that we can use the laws only against undesirables and leave the majority of the population unaffected even if they met the elements of the offense. If it doesn't affect you, then why fix it...

You’re thinking like a programmer, which is fine, but it’s not how lawyers think and operate. The law is not read literally in most cases — even in traditional property crime cases — and never has been. (“Breaking and entering” is a perfect example.) It can’t be, because English is an imperfect language, and situations in which the law is applied are frequently complex and novel. And I don’t think society wants an overly complex and literal legal system: not only will it be even more difficult to understand, but it will encourage even more attempts to evade it and leave a trail of innocent victims until we patch the law to fix the bug. (And if you think it can take software companies a long time to address vulnerabilities, the legislature can take an eternity).

As I’ve said elsewhere, you’re not going to be punished for the mere act of accidentally downloading an open file. Courts look at the totality of the circumstances to determine whether a crime was committed, and the adversarial system makes it such that the prosecutor is going to have to prove beyond reasonable doubt that not only did the proscribed activity occurred, but that the defendant had scienter (required intent/state of mind) and that in a case like this, the circumstances suggested that the data was not intended to be public. And as a defendant you will have the equal opportunity to argue that you didn’t violate the law, or that it was a mere accident. But if you’re keeping a cache of these stolen files around or sharing them with others, then perhaps you’re not so innocent.

There’s an old axiom that “a liberal is a conservative who’s been arrested; a conservative is a liberal who’s been mugged.” If you ever become a victim of a crime, you might appreciate these protections in a way you seem not to today.

“…treat the absence of one as a lack of consent. ” - do you have a source for this?

Their documentation states otherwise https://developers.google.com/search/docs/advanced/robots/ro...

You are correct; my mistake. Nevertheless S3 returns a 403 (unauthorized) response for robots.txt by default which causes Google not to index it.