Hacker News new | ask | show | jobs
by rmbyrro 1010 days ago
Not at all like piracy.

When someone pirates a book, they're replacing the original without consent or remuneration to the copyright holders.

When you train an AI on the contents of a book, you're not replacing it. If someone is interested in the content, they still need to buy it. Using ChatGPT is not a substitute. If it is, they're gonna have to prove it in court, but I doubt they'll be able to.

1 comments

If you can ask ChatGPT about any book contents, you don't need to get the book, and if you don't need to get the book then author got robbed, ClosedAI/MS profited.
The ability to ask people what is contained within a book isn't obviously copyright infringement.

Merely summarizing info and attributing it to the source is the basic element of learning, for both machines and human beings.

These suits are necessary becsuse it's not clear where the line is, and if ChatGPTs functions actually cross it.

What is clear is that OpenAI is doing its best to avoid infringing anyone's copyright even if it is trivial for them to do so. They have the training data so they can simply output it word for word bypass the LLM. They don't do that and further restrain their LLM from making too long recitations.

If you can trick / manipulate the LLM into giving you too much then I say that infringement is on you.

> The ability to ask people what is contained within a book isn't obviously copyright infringement.

The ability to ask a commercial product is. In fact, feeding the book to that commercial product is already infringement.

ClosedAI is doing squat. The very least they could do is ask authors for permission, and of course if they really cared they would have LLM infer attribution and revenue share with the original creators.

I think it's important to distinguish between content and presentation. Most books don't offer entirely new content, but (at best) give some novel way of presenting old content. Consider a modern retelling of Greek Mythology. The stories weren't the original contribution by the author (but by the Ancient Greeks), but the particular way they tell it may be. So ChatGPT telling people about its "content" is unproblematic if it's just telling people how the story goes, and only potentially problematic if it's effectively quoting from the book or mimicking its presentation. (And we all know that if ChatGPT is good at one thing, it's paraphrase or re-expressing the same ideas in substantially different ways, so even if ChatGPT literally copies a book's presentation/wording, that would probably have happened by accident rather than necessity)

The vast majority of publications (especially those of a explanatory nature) do not contribute original content/information. The exceptions are things like research articles/monographs, historical records, government reports. But copyright infringement doesn't apply here because these things weren't published with a profit motive but precisely to publicize the information as widely as possible. The only problem area I can think of involves books published by commercial publishers which promise 'exclusive peek' into the life of some famous person (think biographies of celebrities or books like Fire and Fury). In that kind of case there is indeed original content, and revealing it in detail will arguably mean less sales for the authors/publishers.

it appears from your emphasis that you are arguing generally that "originality" and personal authorship are rare in practice, and therefore imply that mixing in training is "mostly not infringement"

I disagree with this emphasis, given that rote, repetitive or technical material that is not original authorship is not in peril. Human authors who wrote original creative content, or wrote in a style that is personal and widely recognized, their rights to trade and commerce are in peril. That is much more important over the long term, and is not worth losing for convenient information mixers.

> Human authors who wrote original creative content, or wrote in a style that is personal and widely recognized, their rights to trade and commerce are in peril

I see what you're saying, but I fail to see how ChatGPT merely copying their style (not: content) might impact "their rights to trade and commerce". Suppose I ask ChatGPT to "tell me some jokes in the style of Louis CK". Would that make me less likely to stream a Louis CK comedy special?

(By contrast, if I ask ChatGPT to summarize the key revelations from a book like Fire and Fury, that probably would make me less likely to buy the book, because if I buy the book it'd be for the novel information contained in it, but ChatGPT already divulged it to me.)

> Suppose I ask ChatGPT to "tell me some jokes in the style of Louis CK". Would that make me less likely to stream a Louis CK comedy special?

I think you are thinking too narrowly.

Many or most well-known comedians have people write for them. Those writers are to be out of the job because the results of their work were fed into an LLM and now Louis CK will pay MS for it.

Companies who used to pay skilful writers now will pay MS, who trained its AI on works by those skilful writers without asking them. They are out of the job too.

Repeat for every creative industry.

You might find this documentary interesting: "Everything Is A Remix" [1]

[1] https://youtu.be/nJPERZDfyWc?si=IooGFXhb5gbYNWyS

I can also go to the Wikipedia article about any book of note and get a plot or other summary and other information about the book. If that’s the reason for buying the book, “the author got robbed.”
If you ask a human questions about a book, thereby avoiding having to buy the book for class, did you rob the author?
> If you ask a human questions about a book, thereby avoiding having to buy the book for class, did you rob the author?

If someone makes a commercial activity of "answering any question about book contents at any time 24/7", hires tons of people to read those books and reply to billions of such questions daily thereby helping everyone not buy any books, is that robbing book authors?

Food for thought.

is a sale forced or coerced, also comes to mind. Tales of college undergrads forced to buy hundreds of dollars worth of books for single semester come to mind...

but let's be direct - are we talking about market share in the millions of views, where pirate copies are also available, or the sale of any books at all compared to a few hundred over a year. Quite the difference on a subsistence level of an individual author, no?

What sort of questions? How would you know what to ask it, unless maybe you have another source for the book?

Curiously, when I ask GPT-4 about some well-known but under-copyright book, it says it can't answer because of the copyright. For well-known books out of copyright such as Alice in Wonderland, it can recite passages but tends to get lost and start reciting another section or book at some point. Would be real frustrating to use as a substitute.

This reminds me of the tenuous RIAA claim that every pirated piece of media represented a lost sale back when they were suing their customers in the 2000s.
It's been something of a wild ride for me having lived through the "Information wants to be free" era to now live in the new "Reading my publicly published writings and deriving new things from that is theft" era. The next few years of court battles around this are going to be interesting, and I'm not too hopeful on the odds that the "little guy" wins in the end. Seemingly "little guy" affirming results might just turn around and further entrench large players instead.
> Reading my publicly published writings and deriving new things from that is theft

Ah, the mental gymnastics people go through to justify the theft.

Just... no. It's nothing about people reading your writings and deriving things from that. It's about big companies using automated tools to ingest your writing and provide commercial services based on it. To other people. Without paying you a dime.

Are copyright holders being robbed by professors that answer their students' questions?

Don't teachers do the same?

- Trained their minds on existing books

- Tutor the next generation of students

- Give classes on book contents

- Answer questions about those books

The book publishing industry didn't go out of business because there are teachers answering questions. To the contrary, it benefited book sales, because most people aren't good self-learners.

What's wrong with having a machine do the same?

> Don't teachers do the same?

> - Trained their minds on existing books

Training a human = enriching conscious human mind. "Training" AI = mechanically creating a derivative work (no conscious mind to enrich). Training a human is the same to "training" AI as killing a human to "killing" a Unix process, same word different things