| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jalapenos 908 days ago

If ChatGPT is based on neural networks, with no actual save-and-replicate facsimile behaviour, it no more "copies" original work than I do when I tell you about the news article I read today.

I'd say the only real reason the Piratebay links thing you mentioned is not the norm is purely because those media sources have done a better job of striking fear into people doing that, so it's gone more underground. I.e. they're better terrorists.

There's no fundamental, moral reason why Piratebay links being posted and raised to the top would be wrong.

3 comments

vel0city 908 days ago

> it no more "copies" original work than I do when I tell you about the news article I read today

When you tell people about some news article you read earlier you repeat it exactly verbatim? You also give this out to potentially millions or hundreds of millions of people for commercial purposes?

link

kmeisthax 908 days ago

Copyright law does not care about the means of copying, just that you created something with substantial similarity to something you had access to. Whether or not the copy is in the form of a pixel array, blobs of random data being XORd to produce a full copy of music, or rows in a key/value attention matrix, doesn't matter.

Furthermore, there's Google research on extracting training set data from models. More specifically, Google found out that if you ask GPT to repeat the same word over and over again, forever, it eventually starts printing fully memorized training set data[0]. So it is memorizing stuff, even if it's not regurgitating it.

[0] When told of this, OpenAI's response was to block conversations with large amounts of repeated words in them.

link

octacat 908 days ago

So, if someone applies a filter to a video/audio, it is no more "copies" of the original work (no, it is still protected). AI still could produce exact or extremely similar results of stuff it learned on.

link

Matticus_Rex 908 days ago

It's not analogous to a filter, because that's applied to the actual work. The model does not keep the work, so what it does isn't like applying a filter. It's more like being able to reproduce a version of the work from memory and what it learned from that work and others about the techniques involved in crafting it, e.g. art students doing reproductions.

And if OpenAI were selling the reproductions, that would be infringement. But that's not what's happening here. It's selling access to a system that can do countless things.

link

concordDance 908 days ago

> AI still could produce exact or extremely similar results of stuff it learned on.

Can it do so more than a human can?

I think that's the key here. If an AI is no more precise than a human telling you about the news article they read today then ChatGPT learning process probably can't be morally called copying.

link

octacat 908 days ago

So, if someone decompiles a program and compiles it again, it would look different. "It is not copying", we just did some data laundering.

Feeding someone else data into your system is usually a violation of copyright. Even if you have a very "smart" system, trying to transform and obfuscate the original data.

link

Matticus_Rex 908 days ago

> Feeding someone else data into your system is usually a violation of copyright

In some circumstances, yes, but often it's not, especially if you're not continuing to store and use it (which OpenAI isn't).

link

jalapenos 908 days ago

I'm regularly feeding other people's data into my "system" (brain) in order to produce my outputs.

So I'm a living breathing copyright violator. As a person I should be banned.

Fortunately, copyright is a bullshit fictitious right with no basis in natural law. So I don't lose much sleep over it.

link

octacat 908 days ago

Computers are deterministic. Giving the same inputs training would produce the same model. The comparison with brain is incorrect. You could add noise on input data during the training - it would more of less reproduce the real learning. Still, it could produce less useable models as a result.

The court could ask to show the training dataset.

link