Hacker News new | ask | show | jobs
by jsyang00 713 days ago
No he doesn't.

> I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

> There’s a separate category where a website, or a publisher, or a news organization had explicitly said ‘do not scrape or crawl me for any other reason than indexing me so that other people can find this content.’ That’s a grey area, and I think it’s going to work its way through the courts.

9 comments

> I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.

is literally

> I think it's perfectly OK to use content in arbitrary way if it's on open web

The only difference between this and the title is he doesn't think this behavior is called "stealing".

>I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it.

Good thing it doesn't matter what he "thinks" the "social contract" is, copyright is automatic.

It's an incredibly fuzzy statement, really.

You can copy copyrighted material you download for your own use and the use of your friends. You can't copy it to a different website and distribute it further on the "open web". You can modify material provided for download in your own home for any purpose you wish but unless that modification is in the category "fair use" (a small part, a parody, etc) you can't distribute it freely on the web either.

His statement might mean this but it could mean a zillion other things too (what does "it is fair use" mean? etc)

Whether using randomly obtained copyrighted material to "train" an LLM and then selling the output of that LLM is "fair use" seems like so far another "gray area" and the foggy statement seems oriented to reducing awareness of that situation.

Legally you cannot actually do the first thing without permission. The fact that it was technologically impossible to stop, and the damages would be impossible to prove, didn't make it a legal right.
Most people would read the first quote as totally aligned with the article’s title.
That "separate category" isn't a explicit opt in. It's its two different mechanisms for indexing and copyright.

Index is your out. You can use our content is opt in.

Reproduction and recreation, especially when taken physically outside of the Internet or into products for sale has always been a against the rules. As mentioned by another post, torrents of music and movies solidified this stance legally.

Unindexed connect can be open source no strings attached.

> the social contract of that content since the ‘90s has been that it is fair use

This social contract was broken when Google and Facebook pushed remarketing and behavioral tracking, and then started pulling content directly onto their own pages to boot. That was over a decade ago, and it's the reason why every news site now bugs you about running out of "complimentary articles" and how you need to maintain 50 different subscriptions to get what used to be paid for by advertising years ago. The only reason why complimentary articles even exist is to avoid Google delisting them entirely and them not getting any search traffic (since Google doesn't link to shit that isn't free).

> No he doesn't.

Could you please help me see where you see the difference between the title and the quotes? Even after reading them it seems the title is substantially true?

Or to be curt while mirroring your comment’s style: “Yes he does.”

I mean, that seems to be exactly how he's defining "open web" here, actually. That which is - in the dichotomy presented by these two quotes - "the open web" is free game for any use, and he defines things that use language that explicitly disallows all uses except indexing as the complement of this category. Maybe he'd accept any site that effectively declares any "whitelist" of acceptable uses in this category too, though this isn't explicitly stated.

His contention is an assumptive close, wrapping the assumption that anything not explicitly labeled otherwise must use a "blacklist" policy where any usage not specifically forbidden is permitted into "the social contract" that he claims to be so obvious as to not permit challenge

He would like the "grey area" of legal debate on this matter, as he explained quite clearly, to be exclusively about whether AI models can be enforcably barred from training on content for which such a narrow whitelist of acceptable uses has been defined. Naturally this would mean both that the courts could decide such a blanket ban can't bar msft (or anyone) from using this content to train AI models, but also that the court needn't or maybe even can't decide that failure to ban this use case explicitly (or adopt a similar "whitelist" style blanket ban) makes acceptance of it legally implied. Hell, he even leaves room for explicitly banning this use to be rendered legally unenforceable

I can see why he would want that to be the overton window!

Yes, he does.

> content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding

Sure, it's his belief, but this statement is completely incorrect. The words that he's using mean specific things, and he's just got it wrong. You would expect he's smart enough to know this, but 'it is difficult to get a man to understand something, when his salary depends on his not understanding it'.

> That’s a grey area, and I think it’s going to work its way through the courts.

It's not a grey area. Putting up a robots.txt doesn't change copyright, and it certainly doesn't make it a 'grey area'.