Hacker News new | ask | show | jobs
by meowkit 1027 days ago
My opinion as a SWE who is dating a lawyer (joke, not a serious qualification but it does provide some insight):

Generative models traverse and interpolate high dimensional state spaces. These state spaces are created from input data.

I would argue people do the exact same thing - the first main difference is we can use novel inputs (e.g. we can use images or words to develop our music/temporal state spaces and vice versa). People also are recursive and self referential in a way that doesn't collapse.

Until we solve the interpretability problem (e.g. can you decode the feature space of a neural network into something we can comprehend) there is no good solution. Either traditional copyright wins and we get even more draconian policies (think Disney and their desire to never put anything in the public domain), or we have a free for all (which I don't think is bad for creative works, but certainly for more practical things like stock photos or nonfiction).

6 comments

I can appreciate how this line of thinking might be attractive.

But IMO the human<>machine comparison doesn't lend itself much credence. We shouldn't assume that just because a human is allowed to do something, a machine is automatically allowed to do the same thing, too. I think some care should be taken when considering if we allow machines to have the same privileges as humans.

> We shouldn't assume that just because a human is allowed to do something, a machine is automatically allowed to do the same thing, too

There are no sentient machines (at least yet). Your position is one where you are actually limiting what other humans can do, limiting which tools can other humans have access to. Also, the parameter – according to the law – was always "the same". For instance, there is nothing preventing you from making your own chess league where computers are allowed to compete. FIDE is free to ban you from compete own their leagues or to ban anyone associate with your league or whatever, but there is nothing in the law preventing you.

I have been saying this from the day one: this whole debate it's mainly white-collar workers negatively impacted by automation making up any excuse they can to say why their job should be protected, somehow, for some reason, but not the one of coal miners or what have you.

A human downloads a photo to learn how to draw. Another human downloads a photo to teach their computer how to draw. No difference, no need to obtain any license in any of the cases.

> We shouldn't assume that just because a human is allowed to do something, a machine is automatically allowed to do the same thing, too.

Generally speaking, even one machine can do something, it doesn't automatically mean another machine is allowed to do that.

For example you can drive car with a normal driving license, but not a truck. In some states you can own a pistol but no automatic rifle.

It also depends on where this happening. For instance, you don't need a license to drive a car inside your own private propriety. You need a license to drive it on public streets because society needs some assurance that you know what you are doing. So in many cases the laws and restrictions also happen in relation to a given scenario.
copyright exists among other things to "promote the progress of science and useful arts".
That section is written in parallel verse, with copyright <> science, and patent <> useful arts. This sounds weird, now, but it's consistent with the use of the words at the time, which is the reverse of how they are used today, where paintings etc are considered art, and inventions are considered science. So, it's not that copyright exists to promote science and art (as we call them today) but only just the arts. Patents are for science. Authorship reflects copyright and invention reflects patent:

> Congress shall have the power... To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”

A machine is just a tool. It is the creator and the user of the machine that has the privileges he uses the tool with. I think we should be careful not to anthropomorphize, attribute agency, responsibility and autonomy to something that is essential a better photoshop plugin.
I don’t think parent anthropomorphizing anything. The ones who anthropomorphize are saying that machines should be covered by fair use, because they have similarities with humans.

This is not about the rights of a machine but about how one human product is consumed by another human product. This is just a commercial supply chain: if you make a model, you need human data. You generally need to compensate your suppliers of “raw material”.

Its not the tool that is covered by fair use. It is the creation of the tool that is covered by fair use.

Is the tool itself supposed to be a copyright violation or is it a tool facilitating copyright violation by producing violating output?

The later is something that can be tested because we have processes to compare works of art for it. If it is shown that LLMs produce mostly infringing art then we can and should ban or heavily regulate them. If not then not.

> It is the creation of the tool that is covered by fair use.

Copyright doesn’t restrict creation of something, it restricts (mainly) commercial distribution. Research, education and journalism etc are largely unaffected, and would still be.

That said, I believe that selling access to the tool to the public already violates the copyright of the rights holders, even if it doesn’t produce similar works of art. The copyrighted works increased the value of the product (otherwise why would they use it?).

> The later is something that can be tested because we have processes to compare works of art for it.

This is the most expensive, least practical and most arbitrary part of existing copyright. It would be a huge mistake, imo, to expand this dramatically. This problem mostly goes away if the supply chain is sanely regulated.

All you’d need is give access to the training set upon audit, and bureaucrats could check for copyrighted works. There are already automated tools for this.

"That said, I believe that selling access to the tool to the public already violates the copyright of the rights holders, even if it doesn’t produce similar works of art. The copyrighted works increased the value of the product (otherwise why would they use it?)."

So it is similar to how ISPs argue that they should get a cut of streaming services because they enable another product.

I think it is also relevant that more than half of the globe will just completely ignore any regulation and any artist in a country with regulation will just have to compete with ever more empowered artists using all ai has to offer.

“It’s just a machine!”

So are you!

Don't be obtusely misanthropic
The value of copyright is going to vanish. There is enough public domain material to train models on and to avoid the problem altogether.

There used to be professions like tinkerers, bards, clowns. The tinkerers disappeared when the society became modern. The clowns on the other hand managed to lobby for laws that put people into jail for heinous crimes like copying pictures, and survived longer. They are going to bite the dust now.

What you describe would result in the opposite - copyright will be incredibly valuable in a system where the vast majority of "creative works" are just regurgitations of past works in the public domain, churned out by machines. In such a world, none of that has a copyright anyway. Actual creative works, which do garner copyright, will then be that much more valuable, because they will continue to be a property right with a breadth of coverage to make them useful.
Whether or not “humans do it” isn’t relevant. You can walk around with a copyrighted song in your head. That is not copyright infringement. But if you take that song, create a digital copy, and distribute it for money, then you are violating someone’s copyright. Additionally, our legal system requires a balance of probabilities. It’s hard to prove that someone was influenced by another work unless the similarities are plainly obvious. The same does not apply to ML models where the training data and algorithm are knowable facts.
I challenge you to listen to 4 chords of awesome and tell me again about how every song is completely original. How does eragon exist when it's definitely ripped parts from star wars, etc...ai usually doesn't spit out a full plagiarism, but a loosely inspired work which is what most media we consume is.

Edit: 4 chords of awesome link is https://youtube.com/watch?v=oOlDewpCfZQ&si=8vL6PbDnHiaffJh3

A copyright in just Eragon would be incredibly thin, for the exact reasons you state. This criticism of copyright by people that have no understanding of actual copyright law, how it works, how its used, etc, is so exhausting and ignorant.
“Every song is completely original” is the opposite of what I said.
The analogy doesn't hold when you consider the sheer scale of the problem.

I can outright buy a machine for a few thousand dollars that can crank out a faithful rewrite of every Stephen King novel without the shitty endings and nonsense plot points. It can do it in a few days, maybe a couple of weeks at most.

To do that with human labor would take years and cost hundreds of thousands, if not millions of dollars.

Instead of paying an artist a couple hundred for a commissioned drawing, I can just scrape up their entire portfolio and generate any image I want with their style. I can generate hundreds or thousands of images. I can take their distinct style and use it exclusively as the branding for my company.

What a ML model does is very fundamental not what happens when a human draws inspiration from prior art. A human would require an extremely significant amount of time and resources to perfectly imitate every artist they have ever seen. It takes a human significant time and resources to produce faithful variations on prior art.

A ML model is measured in words or images per second.

Hello.

Maintaining a system like Netflix or AWS or even Amazon will require insane amount of people and time, if possible at all within a finite time, without all the computers doing work for us in seconds that would take humans ages to do.

> ... a SWE who is dating a lawyer

> I would argue people do the exact same thing

Perhaps a ménage à trois with a neuroscientist would change your view on this.

> Until we solve the interpretability problem (e.g. can you decode the feature space of a neural network into something we can comprehend) there is no good solution.

This is the rub. Without reverse attribution... open source anonymous models become a free-for-all loophole.

Since that doesn't currently exist, I think the best we can do is to say that any commercial entity using a model bears the responsibility of proving the model they use is untainted by copyrighted material (to which they haven't secured rights).

Open source model X is... whatever it is.

But I'll be damned if OpenAI / Meta / Microsoft / IBM should be able to build a commercial product on top of laundered copyrighted material while ignoring provenance.

I mean, we have models for this: software code and art. Both aren't clearly attributable. In the case of software code, we've developed case law around clean room design and similarity. In the case of art, we value verifiable chain of custody.

Hopefully, something similar would tilt commercial funding of AI in the direction of responsible use.