| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nadermx 57 days ago
	Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen? Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

4 comments

tensor 57 days ago

I still find the idea that "learning" from code is "stealing" kind of ridiculous.

link

array_key_first 57 days ago

The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.

It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.

link

tensor 57 days ago

It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.

link

leereeves 56 days ago

It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.

It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)

link

Eisenstein 56 days ago

> It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2.

Is there any line past which it becomes unreasonable?

> It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

If the machines were a replacement for a damaged respiratory system in a human would it reasonable?

What about if the machine were being used by a human to do something else that was important?

Where is the line where it becomes reasonable?

link

leereeves 56 days ago

> Is there any line past which it becomes unreasonable?

That's exactly the question we should be asking about AI and fair use.

link

aeon_ai 57 days ago

If one defines 'flying' to be a bird's endeavor, then humans can't fly.

Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.

link

greendestiny 57 days ago

Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.

The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?

link

nadermx 57 days ago

Relevant: https://www.bluewin.ch/en/news/swiss-company-builds-airport-...

link

Dylan16807 56 days ago

It's a relevant extension if you think the ability to learn from a work is a right people have that exempts them from the more general lockdown copyright would impose.

If you come at it from the view of copyright being a limited set of control over some areas but not others, then if copyright doesn't block human learning it shouldn't affect anything similar either, unless a specific rule is added to make those situations be handled differently.

link

boh 57 days ago

Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

The mental calisthenics required to justify this stuff must be exhausting.

link

idle_zealot 57 days ago

> The mental calisthenics required to justify this stuff must be exhausting.

It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.

link

balamatom 57 days ago

Correct.

Knowledge consists of, roughly speaking, thoughts.

(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)

The "thinking" part of a "thinking being" - that also consists of thoughts.

If your knowledges are someone's property, you are someone's property.

A society where all knowledge is proprietary, is a society of ubiquitous slavery.

Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.

Doesn't matter.

link

spankalee 57 days ago

Humans have been known to recite entire parts from plays from memory, live in front of audiences even.

link

leni536 57 days ago

And they are legally required to license the play to do that, if it's still in copyright.

link

spankalee 56 days ago

Only to perform it, not learn it.

link

leni536 56 days ago

And LLMs perform when you prompt them.

link

Dylan16807 56 days ago

> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

Are you finding people that actually say this?

When it can quote something like that, it's a training error. A popular enough work gets quoted and copied by people online, and then it's not properly deduplicated. It's a very small fraction of works it can do that with, and the cleaner your data the less it happens.

I'll once again quote that stable diffusion launched with fewer weights than training images. It had some accidental memorizations, but there wasn't room for its core functionality to be memorization-based.

link

Eisenstein 56 days ago

This is a perfect example of 'begging the question'. Arriving at a conclusion from a fact assumed as true without evidence. Your reductio does not actually demonstrate that copyright applies to LLMs, because you did not demonstrate how transcoding is comparable to inference, just that LLMs can reproduce some passages from copyrighted works. You could also produce passages from copyrighted works by generating enough random sequences of words, but no one is arguing that is comparable to transcoding. That the people who do not share this conclusion are engaging in motivated reasoning is based only on your assumption and has no logical backing, and is therefore begging the question.

link

pessimizer 56 days ago

"Learning" for LLMs is just as goofy and propagandistic a metaphor as "stealing" for copyright. I find it predictive of your position that you'll accept one dumb metaphor for something that we didn't need a metaphor for, but not the other.

Are you for stealing and against learning?

We know exactly what is happening in both cases. We can talk about that, or we can use obfuscating euphemisms that make our preferred position seem obviously true.

link

nkrisc 57 days ago

I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.

link

greendestiny 57 days ago

I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".

link

alok-g 57 days ago

>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.

link

greendestiny 57 days ago

I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.

Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.

link

alok-g 57 days ago

The part I agree to is that copyright law calls out humans specifically as the potential owners of copyright. So what you suggest seems to be the only possibility out. Calling out humans could imply that when a human reads a thousand books and then writes something basis the same but which is not a substantial copy of anything explicitly read, that human owns the copyright to the text written. Whereas, if an artificial neural network does the same (hypothetically writing the same text), it would not.

The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.

link

pydry 57 days ago

If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".

I have seen LLMs do all sorts of crap which was clearly reproduction of training material.

This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.

link

jakeydus 57 days ago

Remember last year (?) when one of the major AIs produced a bit of code that included Jeff Geerling's name in a comment?

link

charonn0 56 days ago

Is "learning" the correct term?

Or is it "plagiarism"?

link

lo_zamoyski 57 days ago

If there were the case, then imagine having to give it back!

link

estimator7292 57 days ago

Learning, probably not.

Copy/pasting at scale, yes

link

vorticalbox 57 days ago

It is learning though. It’s not just copying the code.

Code gets turned into tokens and then it learns the next most likely token.

The issue that I see most people talk about it the scale at which is learnt.

A human will learn from other people’s code but not from every persons code.

link

cogman10 57 days ago

The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.

Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.

It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."

I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.

link

red75prime 57 days ago

Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.

And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.

link

cogman10 57 days ago

> Does it mean that shreds retain original copyright even if the content can't be restored?

Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.

The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.

> And the specifics of autoregressive pretraining is that it is lossy compression.

That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.

> Good luck finding which copyrighted materials have made it into the final weights.

That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.

Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.

link

ell1e 56 days ago

LLMs seem to be so devoid of intelligence, I think it's arguable if that's learning: https://machinelearning.apple.com/research/illusion-of-think... Typically, you would imply a level of understanding when you say learning. LLMs apparently can't do that, by design.

link

blks 57 days ago

A human is not a commercial product. Here we have commercial product that was created by using a lot of various copyrighted and protected IP, without licensing agreements, without paying, without even citing it.

link

margalabargala 57 days ago

Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?

link

MagicMoonlight 57 days ago

If I “learned” your essay and handed it in, would you be happy with that?

link

NewsaHackO 57 days ago

Everybody has had a complete 180 in terms of copyright protections. Before, nobody cared about downloading music, movies, TV shows, or pirating games. Now, when the copyright law is affecting them, they are gungho about protecting these billion-dollar companies' copyrights.

link

jeppester 57 days ago

A more logical explanation would be that there are different opinions and those who complain are usually louder.

link

NewsaHackO 56 days ago

Yes, that's my point. They are different and contradictory opinions, which show hypocrisy.

link

inexcf 56 days ago

No it is not your point. You're just arguing about a strawman that holds both of those contradictory positions.

link

NewsaHackO 56 days ago

You are attempting to invoke strawman. So is your point that there is not a significant overlap between posters who think that AI companies should not be allowed to pirated use copyrighted material in their training corpus and posters who themselves pirated copyrighted material such as movies, music, games, etc.?

link

Dylan16807 56 days ago

Yes, that is their point. Do you have evidence against it?

I'm sure you can find some overlap, but I bet the vast majority is caused by people making a distinction between commercial and noncommercial piracy. I don't think there's a big cohort of piracy hypocrites.

link

MSFT_Edging 56 days ago

It's all power.

The music and movie companies have power. They have the funds to bankrupt you with a small army of lawyers. You as an individual do not stand a chance against corporate lawyers. They can destroy your life over fairly minimal and non-violent offenses.

AI companies are backed by the very powerful. They can steal all they want and use the same army of lawyers to bankrupt any small rights holder. The big rights holders go to the same parties and allow it to happen.

Regardless of the actual take on copyright, both methods skullfuck the little guy without power.

People cry foul because, at least in the US, we claim to live in a free country based on equality, yet there is a very obvious caste system of the haves and the havenots.

It errodes the legitimacy of the system. Imagine if for years you see news reports of a mother getting a judgment against her where she owes 100s of thousands because she seeded a Brittany Spears song. Then you suddenly see the same laws that were leveraged to instill fear in you, tossed aside when the rich and powerful say it doesn't count anymore, you're going to cry foul!

It's not a hypocrisy of position on copyright, it's bearing witness to the illegitimacy of the laws they're bound by.

link

AuthAuth 55 days ago

Its not a 180. You can be against copyright but as long as copyright is still being enforced on you then you can think it should be enforced on AI companies.

I'd prefer no copyright but we live in a world where there is copyright so its unfair that only AI companies get to be immune.

link

preisschild 57 days ago

Its not about "billion-dollar companies' copyrights", but also about voluntary copyleft free software. If I license my code under GPL I don't want other persons/companies just whitewash that code through LLMs and use it in their proprietary code.

link

NewsaHackO 56 days ago

I agree with this, and I think that it is an open question whether or not training on copyrighted material is considered transformative or not. However, someone said that thumbnails of full photos are considered transformative enough to allow fair use, and LLM training is (in my opinion) clearly more transformative than converting a picture to a thumbnail. But we will see how it plays out.

link

Neywiny 57 days ago

I don't think it's unreasonable to consider it stolen potential profit, but agreed that's not how they spin it

link

blks 57 days ago

“Stolen” as in “profited on IP against terms and conditions of the license”.

link