Hacker News new | ask | show | jobs
by kuchenbecker 1018 days ago
I slightly disagree, in that I think the person using the tool should bear the burden of copyright. I.e. if the model outputs something under copywrite it merely can't be republished. In this same way, i can use Photoshop on proprietary data but I can't necessarily sell the results.
5 comments

I'm so torn. On one hand, what you suggest seems to be a nearly ideal balance between advancing scientific progress and legal liability. By placing the legal burden to publish generated works on the person actually trying to publish, it allows for a more nuanced legal approach (i.e. the difference between "there are similarities to this work, but it's murky" or "you %100 stole that work").

On the other hand, is the company running the model themselves not already publishing all of that work and profiting from it? It seems unfair that their bottom line gets to be bolstered because they can produce work based on any artist, whereas the consumers of that work may need to end up walking on egg shells in order to publish them.

Like I said, I'm torn as far as how it "should be". I know how I want it to be though. I would love if AI continued training unabated. The results have been amazing, and I believe it would be a shame if the effort was slowed down by legislation.

> is the company running the model themselves not already publishing all of that work and profiting from it?

no, because the model is transformative enough that it cannot be said to be a derivative works of the training set.

The model is in essence a form of distilled information, extracted from the training set. Information cannot be copyrighted - only expressions can.

Therefore, a model producer should have the right to use any pre-existing work, in the same way a person can, to study and internally memorize and extract information.

The reproduction of any of the training set data constitutes a copyright violation, but this is not done by the owner of the model, but by an end user of the model.

My point is that if a court finds that a generated image is indeed similar enough to constitute an infringement when a subscriber of for instance MidJourney attempts to publish it, has that work not already been "published" to the subscriber? And has MidJourney not profited by gaining a subscriber based on the work of others?
I wonder if that analogy represents the same thing. Speaking purely from a non-legal perspective on the ethics in my mind:

When you use Photoshop on propriety data you're providing the original data and choosing what manipulation to make (i.e. what tool) and directly creating the output. It makes sense that if you redistribute this it may be copyright violation.

When you use Copilot or ChatGPT for programming you're typically asking a non-proprietary question or accepting suggestions it's making based on non-proprietary (or proprietary to you) code in the file. You also don't dictate the manipulation process a black box deep learning model does (i.e. I haven't asked it to do something that could be reasonably thought to be a copyright violation).

Am I then responsible for the fact that Copilot is fooling me with effectively copy-pasted copyrighted code when it's being presented to me as generated by the software and I haven't instructed the software to commit a copyright violation? I'm not sure if intent matters for copyright, I assume it doesn't but perhaps that's a missing piece to this.

Diffusion models are gray to me, if you're asking/prompting with "Mickey Mouse riding a horse" I can see the argument that the prompt itself can be interpreted as asking the model to commit copyright violation and the user is just hiding behind a layer of abstraction. If I ask the model to spit out "a picture of a smiling cartoon woman" and it generates a Betty Boop lookalike is that still the users fault?

It seems to me like passing the burden to the user could be reasonable but would need some safe harbor type of exception. It'll be really interesting to see what the courts decide.

I see 2 problems with that.

(1) how do you know if the image that just generated is substantially similar to an existing copyright work? Maybe if some registration tool existed, but other wise the burden is too great

(2) what is stopping someone from generating millions of images and copy righting all the "unique" ones? Such that no one can create anything without accidental collisions.

> how do you know if the image that just generated is substantially similar to an existing copyright work?

This is already a problem with biological neural nets (i.e. humans). I remember as a teenager writing a simple song on the piano, and playing it for my mom; she said, "You didn't write that -- that's Gilligan's Island!" And indeed it was. If I had made a record and sold it, whoever owned the rights to the Gilligan's Island theme song could have sued me for it, and they would (rightly) have won.

There's already loads of case law about this; the same thing would apply to AI.

> what is stopping someone from generating millions of images and copy righting all the "unique" ones? Such that no one can create anything without accidental collisions.

Right now what's stopping it is that only humans can make copyrightable material; whatever is spat out from a computer is effectively public domain, not copyrighted.

1. lots of established law and case law (at least in the US), this is already a well-settled problem and folks have the tools and proper venue to bring infringement claims. Yes, federal copyright infringement litigation is prohibitively expensive for many issues. There is a now a "small claims court" for smaller issues. [1]

2. Those works cannot be copyrighted (at least in the US). [2]. And hey, someone already tried copyrighting every song melody [3]

[1]: https://copyright.gov/about/small-claims/

[2]: https://www.federalregister.gov/documents/2023/03/16/2023-05...

[3]: https://www.youtube.com/watch?v=sJtm0MoOgiU

But that problem is already solved.

Copyright holders are already protected from (I.e. can legally prohibit) distribution of obvious copies, or clearly derivative works.

Regardless of how they were produced by hand, copy machine, Photoshop or with a model.

The new problem is that artists styles are being “stolen” by incorporating their copyrighted work into models without their permission.

And that problem can easily be solved if using copyrighted material to create models is declared NOT fair use.

Artists could still allow models to be built from their work, but on their terms. If they wish to do that.

A famous artist, that doesn’t mind being commercial, could sell their own unique model to let fans create art in that artist’s style, while not having their style “ripped” by others.

Or just keep their style to themselves, for their own work, as artists have done for centuries.

(Of course, with greater effort, their style could still be recreated - styles are not protected unless they are trademarked - but the recreation would have to be done without using the artist’s copyrighted works.)

This is probably a somewhat unpopular opinion on HN, but it is where many of the artists I work with are generally trying to get to. Consent, compensation, and credit.
> Consent, compensation, and credit.

I just want to quote you. Nothing I need to say. That’s it.

This is the best path forward I think. And it will become increasingly sensible as things continue to evolve. AI wasn't necessary to violate copyright before, and it isn't necessary today.

The determination of copyright violation should be made against the output of the model in the event that someone uses it for commercial purposes.

If the models have a risk of generating copyrighted content, it will be up to the consumers of the system to mitigate that risk through manual review or automated checks of the output.

A divergence, but I see a lot of posters asserting that "humans learn by copying other people, but we don't call that a violation of copyright when they draw"

People casually asserting that software is equivalent to humanity will be a non-negligible thing to consider, as irritating and poorly-founded as it seems.

If the reproduction isn't pixel-perfect, but merely obvious and overwhelming, how do you refute that philosophically to people who refuse a distinction between 50GB and a human life?

> People casually asserting that software is equivalent to humanity will be a non-negligible thing to consider, as irritating and poorly-founded as it seems.

> If the reproduction isn't pixel-perfect, but merely obvious and overwhelming, how do you refute that philosophically to people who refuse a distinction between 50GB and a human life?

Software equivalence to humanity is a very philosophical question that many sci-fi writers have approached. But our primary issue related to this technology does not depend on anyone making a determination there.

The challenge is that losses to livelihood from this technology are going to come from far broader impacts than copyright alone. Copyright disputes are just the first things to get everyone's attention.

Let's say we err on the side of protection of copyright, and all training data must be fully licensed, in addition to users being responsible for ensuring outputs did not accidentally reproduce something similar to a copyrighted work, even if it was part of the licensed training dataset. Great! This fixes the problem of lost value for the owners of copyrights. Companies will face a slight delay and slightly increased costs as they license content; however, in the end, model capabilities will be the same and continue to increase.

The number of jobs that actually cannot be performed without humans will continue to dwindle — livelihoods will be lost at essentially the same scale despite upholding copyrights.

The only way we can handle a technology capable of reducing most need for human labor is by focusing on planning and executing a smooth transition toward an economy with more people than jobs — aiming for minimal human suffering during this process.

A mass loss of human jobs does not need to mean a mass loss of livelihood if our society is prepared to transition to a universal basic income. After all, human life is far more than just a job. We have the opportunity for much more fulfilling lives if we plan this transition well. We must understand that this is a far larger issue than copyright - copyright disputes are just one of the first symptoms of this disruptive process.

A human is still entering the prompt to generate the possibly copyrighted image/text. I don't think copyright law should care about the implementation. It's ok to copy a style if you use paint brushes or photo shop. But not ok if you use a statistic model?
Apply for a copyright on your human authored prompt then. That's the extent of human authorship.