Hacker News new | ask | show | jobs
by aragonite 1010 days ago
I think it's important to distinguish between content and presentation. Most books don't offer entirely new content, but (at best) give some novel way of presenting old content. Consider a modern retelling of Greek Mythology. The stories weren't the original contribution by the author (but by the Ancient Greeks), but the particular way they tell it may be. So ChatGPT telling people about its "content" is unproblematic if it's just telling people how the story goes, and only potentially problematic if it's effectively quoting from the book or mimicking its presentation. (And we all know that if ChatGPT is good at one thing, it's paraphrase or re-expressing the same ideas in substantially different ways, so even if ChatGPT literally copies a book's presentation/wording, that would probably have happened by accident rather than necessity)

The vast majority of publications (especially those of a explanatory nature) do not contribute original content/information. The exceptions are things like research articles/monographs, historical records, government reports. But copyright infringement doesn't apply here because these things weren't published with a profit motive but precisely to publicize the information as widely as possible. The only problem area I can think of involves books published by commercial publishers which promise 'exclusive peek' into the life of some famous person (think biographies of celebrities or books like Fire and Fury). In that kind of case there is indeed original content, and revealing it in detail will arguably mean less sales for the authors/publishers.

1 comments

it appears from your emphasis that you are arguing generally that "originality" and personal authorship are rare in practice, and therefore imply that mixing in training is "mostly not infringement"

I disagree with this emphasis, given that rote, repetitive or technical material that is not original authorship is not in peril. Human authors who wrote original creative content, or wrote in a style that is personal and widely recognized, their rights to trade and commerce are in peril. That is much more important over the long term, and is not worth losing for convenient information mixers.

> Human authors who wrote original creative content, or wrote in a style that is personal and widely recognized, their rights to trade and commerce are in peril

I see what you're saying, but I fail to see how ChatGPT merely copying their style (not: content) might impact "their rights to trade and commerce". Suppose I ask ChatGPT to "tell me some jokes in the style of Louis CK". Would that make me less likely to stream a Louis CK comedy special?

(By contrast, if I ask ChatGPT to summarize the key revelations from a book like Fire and Fury, that probably would make me less likely to buy the book, because if I buy the book it'd be for the novel information contained in it, but ChatGPT already divulged it to me.)

> Suppose I ask ChatGPT to "tell me some jokes in the style of Louis CK". Would that make me less likely to stream a Louis CK comedy special?

I think you are thinking too narrowly.

Many or most well-known comedians have people write for them. Those writers are to be out of the job because the results of their work were fed into an LLM and now Louis CK will pay MS for it.

Companies who used to pay skilful writers now will pay MS, who trained its AI on works by those skilful writers without asking them. They are out of the job too.

Repeat for every creative industry.

You might find this documentary interesting: "Everything Is A Remix" [1]

[1] https://youtu.be/nJPERZDfyWc?si=IooGFXhb5gbYNWyS