|
|
|
|
|
by jpc0
860 days ago
|
|
Thought question, not entirely related but if you want to go that that route it actually is. If I generate some media in say Photoshop. I then send you a JPEG representation of said media. You then distribute a PNG copy of the image without license. Have you violated copyright law? At what point is there enough parameters to an LLM that it is effectively just a compressed version. How about deduplicated storage? Is an image stored on that and the reproduced using an index of some sort to distribute a violation. If I put data into a thing,lets call it training and lets call the thing a model, and then I request the data out of it and get what is perceived as an exact replication of said thing did I create a copy. Does it matter if we call the thing a hard drive instead? |
|
If I ask you to draw Mickey Mouse, you can probably produce a very good representation of him. If I asked you to write the script of The Matrix, assuming you've seen it, I suspect you'd get all the plot points down and major quotes even if it has been years since you've seen it. Are you creating a copy? Absolutely! Don't distribute either of those things without a license. But does the fact that you are capable of making a copy of a thing when asked mean that you've violated copyright way back when you watched the Matrix? Is there a copy of the Matrix or Mickey Mouse in your brain?
I will take the strong position that neither our brains nor LLMs contain copies of data in the way that is a violation of copyright. But both are equally capable of generating copyright violating materials.