| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by DrScientist 490 days ago

The case looks pretty straightforward to me - they copied the notes ( human or machine doesn't really matter ) to directly compete with the author of the notes.

If you wrote a program that automatically rephrased an original text - something like the Encyclopaedia Britannica - to preserve the meaning but not have identical phrasing - and then sold access to that information on in a way that undercut the original - then in my view that's clearly ripping off the original creators of the Encyclopedia and would likely stop people writing new versions of the encyclopedia in the future if such activity was allowed.

These laws are there to make sure that valuable activities continue to happen and are not stopped because of theft. We need textbooks, we need journalistic articles - to get these requires people to be paid to work on them.

I think it's entirely reasonable to say that an LLM is such a program - and if used on sources which are sustained by having paid people work on them, and then the reformatted content is sold on in a way to under cut the original activity then that's a theft that's clearly damaging society.

I see LLM's as simply a different way to access the underlying content - the rules of the underlying content should still apply - ChatGPTs revenues are predicted to be in the billions this year - sending some of that to content creators, so that content continues to be produced, is not just right - it's in their interest.

1 comments

zozbot234 490 days ago

> automatically rephrased an original text - something like the Encyclopaedia Britannica - to preserve the meaning but not have identical phrasing

Note that it's very hard to do this starting from a single source, because in order to be safe from any copyright concern you'd have to only preserve the bare "idea" and everything else in your text must be independent. But LLM's seem to be able to get around this by looking at many sources that are all talking about the same facts and ideas in very different ways, and then successfully generalizing "out of sample" to a different expression of the same ideas.

link

DrScientist 489 days ago

The concept clustering across multiple sources allows you to rephrase more accurately while retaining meaning - however the point I'm making is if you then point that program at Encyclopaedia Britannica and simply rephrase it then charge for access to the rephrased version - should you be allowed to do that?

link

zozbot234 489 days ago

The underlying problem is that "meaning" in the ordinary sense still includes plenty of copyrightable elements. If you point a typical LLM program at some arbitrary text and tell it to "rephrase" that, you'll generally end up with a very close paraphrase that still leaves intact to a huge extent the "structure, sequence and organization" (in a loose sense) of the original. So it turns out that you're still in breach of copyright. All you're allowed to use when starting from a single copywritten text is the ideas and facts in their very barest sense.

link

DrScientist 488 days ago

So if I made a pop song with was entirely copied from existing songs - but ensured that each fragment was relatively short ( but long enough to be recognisable ), then I'd be ok?

ie the way to avoid copyright is to double down on the copying?

I can see how, for a human, you could argue that there is creativity in splicing those bits together into a good whole - however if that process is automated - is it still creative - or just automated theft?

link