| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by koboll 1160 days ago

Copyright infringement is when you take copyrighted work and distribute it directly, or so close to directly that it can't be said to be "transformative".

Obviously LLM outputs are transformative, so this argument falls completely flat. As the writer is a copyright lawyer, it's hard to conclude anything other than they are knowingly lying, or at minimum wishcasting what they want the law to say instead of what it does say.

I think the misconception stems from the laymen understanding of copyright clipping off the last part of that sentence so it's just "Copyright infringement is when you take copyrighted work".

Proof of the success of industry campaigns to vilify things like taping broadcast television.

5 comments

version_five 1160 days ago

"You wouldn't send an http GET request to a car"

dragonwriter 1160 days ago

... well, DELETE is more fun.

nonethewiser 1160 days ago

Thats true

dragonwriter 1160 days ago

> Copyright infringement is when you take copyrighted work and distribute it directly, or so close to directly that it can't be said to be "transformative".

“Transformative” is one consideration in one of four factors for fair use, so that's not right on the exception, and infringement happens when any exclusive right (literal copying, creating a derivative work, public performance, etc) is done with neither permission nor an exception such as Fair Use, not just at distribution.

FreakLegion 1160 days ago

> Copyright infringement is when you take copyrighted work and distribute it directly

It goes much further. Loading copyrighted work into a computer's memory, for example, is copying and can infringe. There are 30 years of precedent here. See https://en.wikipedia.org/wiki/MAI_Systems_Corp._v._Peak_Comp....

joshka 1160 days ago

> Copyright infringement is when you take copyrighted work and distribute it directly, or so close to directly that it can't be said to be "transformative".

Distribution isn't necessarily a required element of infringement. Just the creation of a derivative work can be enough.

dingledork69 1160 days ago

LLMs are just a novel compression algorithm.

version_five 1160 days ago

Imagine someone built a 100 trillion parameter model, Greg, who is a personal assistant. Greg had a context length of a billion tokens, and he can search the web and tell you what he's found. His memory is so good that he can quote verbatim the full text of anything he's read. It's not even compression, it's just straight up storage. Should you have to pay royalties to everyone whose content you ask Greg to look at?

What if Greg isn't an llm and he's your browser cache? Are you still infringing copyright?

feoren 1160 days ago

What if Greg is an online repository, crawling the web and storing and distributing copyrighted materials verbatim? Stripping out attribution? With ads and/or a paid subscription fee?

> Should you have to pay royalties to everyone whose content you ask Greg to look at?

If Greg talks so fast that he's distributing millions of these copies around the world, for money, then yes, of course he's infringing.

> What if Greg isn't an llm and he's your browser cache?

My browser cache is not a distribution mechanism. It's for my personal use. I'm not infringing on copyright if I keep books in my personal library. I am if I'm copying them millions of times and giving others access to that library for money. If I downloaded a bunch of paywalled content and then uploaded my browser cache to SomePirateSite.com, for money, then yes, I'm infringing.

Why do you think these are "gotcha" questions? This is pretty straightforward stuff, and nowhere does it prove that LLMs are not infringing.

spullara 1160 days ago

You are definitely infringing by making a copy of a book and keeping it in your personal library.

epups 1160 days ago

> What if Greg is an online repository, crawling the web and storing and distributing copyrighted materials verbatim? Stripping out attribution? With ads and/or a paid subscription fee?

So you mean Google, right?

feoren 1160 days ago

Yes. Google preventing page-clicks by showing their half-assed, confidently wrong summaries that they scraped directly from the top results? Yes, that's copyright infringement. Simply linking to the site with a short preview is not.

twoodfin 1160 days ago

Your browser cache doesn’t repurpose the content it stores to create derivative works. LLMs do so by definition.

CamperBob2 1160 days ago

Or what if Greg is a human with eidetic memory?

Riverheart 1160 days ago

Humans and software are not the same. Humans get a pass on regurgitating some stuff because our memories are fuzzy and more importantly we are not eternal, distributable entities that scale based on GPUs available.

Kim Peek is Greg and the difference between him and and AI is the text above.

spullara 1160 days ago

Isn't this called Google search?

nonethewiser 1160 days ago

In the sense that the big model _effectively_ stores a lot more information than it directly contains? Of course it doesnt actually store it, it just has lots of logic that can generate it.

Thats an interesting notion but isnt this just as true of any generative logic, provided the size of the set of results is larger than the code size? Like a random number generator. Can you compress infinity?

fooker 1160 days ago

Humans are just novel compression algorithms, our education systems reflect this! :)

octacat 1160 days ago

A compilation step. Basically, google search to complete your code. And after "why you've copied this code? idk, it is not me, it is the model :(". :D