Hacker News new | ask | show | jobs
by Arnt 898 days ago
We use copyrighted materials all the time without permission. You and I both read the Verge article without the Verge's permission. Reading is the intended and most common use of the Verge's articles, and neither of us asked for permission. I didn't print that one but I often do print, always without asking anyone's permission.

Copyright has that name because copying is exceptionally protected; general use is not.

You can argue that training is a kind of copying, since it involves copying of things from RAM to RAM, etc. I find that difficult, since we've established that e.g. this browser's copying of web page contents from RAM to RAM isn't.

If you don't argue that training is copying, then you can argue that since training is a necessary prelude to copying, it should be treated like copying legally. I disagree, because various kinds of fair use also has the training as a necessary prelude (and, uh, the purchase I mentioned could also be a necessary prelude to copying, if my goal was to copy the album).

3 comments

This is apples to oranges.

You don't need permission as a human, to read content if it's freely available. It was explicitly made for us to consume with an expectation of returning value. Mainly advertising.

Laws are (in theory) put in place to ensure a fair playing ground. If Company B requires content from Company A, but is causing financial damage to company A by using it, this is not fair use.

I'd also like to add the intentions. Even if we decide to quote the paper later in the day, I would say it's fair to assume 95% of readers do not intend to copy the content for profit.

OpenAI on the other hand is explicitly intending to copy the material to reuse into its own content in millions of generations for profit.

No metaphors needed.

Could you please explain (without metaphors!) why the publishers who publish 20-page summaries of books do so legally, while GPT's reuse into its content violates copyright?
A 20-page summary of a book is a substantially different creation. They likely don't even have entire paragraphs reproduced, maybe only a few quotes. Those summaries also have deeper introspection on the overall work along with potentially critique about the work. It is a different creation even though related to another.

Exactly reproducing most of an article is vastly different from a short summary. ChatGPT was exactly reproducing large amounts or entire articles. If ChatGPT was only writing short summaries of articles or critiques about them this case would be radically different. But in the end, ChatGPT is exactly reproducing copyrighted works.

Because those are summaries.

The NYT found substantial portions of exact copies of their content being reproduced when you give ChatGPT the right prompts.

This is still apples to oranges, but I'll bite.

A summarized book can still entice a potential reader to purchase it. It's a form of advertising.

Chewing up content and spinning it without any citations does not provide the original owners any form of publicity.

Viewing copyrighted materials is basically never the problem. When we read The Verge we aren’t redistributing any of their content. We are doing exactly what they are granting a limited license to do: read the articles and use them for non-commercial purposes.

See section 14 of The Verge’s terms of use as an example: https://www.voxmedia.com/legal/terms-of-use

Free use can only override some of this: for example, I can use content from The Verge if I’m using a limited percentage of it for the purpose of critique and discussion. This application of fair use is basically the for-profit business model of YouTube channels like WatchMojo, which use small clips of movies and TV along with commentary and critique. Without that commentary or without limiting their redistribution to small portions of the work, they would be breaking copyright law.

The problem is redistribution of a substantial portion of the work. The NYT has allegedly found some very damning instances where ChatGPT provided answers containing substantial almost unchanged portions of text directly copied from NYT articles. NYT never granted ChatGPT any license to redistribute their content for commercial purposes, and it doesn’t seem like ChatGPT is doing anything covered by fair use (such as providing discussion or commentary).

I’m not aware of any part of copyright law that gives an infringing party a pass just because someone pressured them into infringement.

Your main example is completely wrong.

Websites like verge have terms of use one is legally obliged to follow to have permission to view and use their site.