Hacker News new | ask | show | jobs
by TeMPOraL 434 days ago
> People either don't understand how these models work or they purposely misrepresent it (or purposely refuse to understand it).

Not only that, they also assume or pretend that this is obviously violating copyright, when in fact this is a) not clear, and b) pending determination by courts and legislators around the world.

FWIW, I agree with your perspective on training, but I also accept that artists have legitimate moral grounds to complain and try to fight it - so I don't really like to argue about this with them; my pet peeve is on the LLM side of things, where the loudest arguments come from people who are envious and feel entitled, even though they have no personal stake in this.

3 comments

>Not only that, they also assume or pretend that this is obviously violating copyright, when in fact this is a) not clear, and b) pending determination by courts and legislators around the world.

Legislation always takes time to catch up with tech, that's not new.

The question I'm see being put forth from those with legal and IP backgrounds is about inputs vs. outputs, as in "if you didn't have access to X (which has some form of legal IP protection) as an input, would you be able to get the output of a working model?" The comparison here is with manufacturing where you have assembly of parts made by others into some final product and you would be buying those inputs to create your product output.

The cost of purchasing the required inputs is not being done for AI, which pretty solidly puts AI trained on copyrighted materials in hot water. The fact that it's an imperfect analogy and doesn't really capture the way software development works is irrelevant if the courts end up agreeing with something they can understand as a comparison.

All that being said I don't think the legality is under consideration for any companies building a model - the profit margins are too high to care for now, and catching them at it is potentially difficult.

There's also a tendency for AI advocates to try and say that AI/LLM's are "special" in some way, and to compare their development process to someone "learning" the style of art (or whatever input) that they then internalize and develop into their own style. Personally I think that argument gives a lot of assumed agency to these models that they don't actually have, and weakens the overall legal case.

“Not only that, they also assume or pretend that this is obviously violating copyright, when in fact this is a) not clear, and b) pending determination by courts and legislators around the world.”

Uh huh, so much worse than the people that assume or pretend that it’s obviously not infringing and legal. Fortunately I don’t need to wait for a lawyer to form an opinion and neither do those in favor of AI as you might’ve noticed.

You see any of them backing down and waiting for answer from a higher authority?

> You see any of them backing down and waiting for answer from a higher authority?

Should they? That's generally not how things work in most places. Normally, if something isn't clearly illegal, especially when it's something too new and different for laws to clearly cover, you're free to go ahead and try it; you're not expected to first seek a go-ahead from a court.

You just chided people for having strong opinions about AI infringement without a court ruling to back them up but now you’re saying that creating/promoting an entire industry based on a legal grey area is a social norm that you have no strong feelings about. I would have thought the same high bar to speak on copyright for those who believe it infringes would be applied equally to those saying it does not, especially when it financially benefits them. I don’t think we’ll find consensus.
This is silly. What are you proposing? A coup to ban AI? Because that is the alternative to waiting for legislators and courts.
Never proposed a ban, the issue is copyright, use licensed inputs and I could care less.

Pro AI people need to stop behaving like it’s a foregone conclusion that anything they do is right and protected from criticism because, as was pointed out, the legality of what is being done with unlicensed inputs, which is the majority of inputs, is still up for debate.

I’m just calling attention to the double standard being applied in who is allowed to have an opinion on what the legal outcome should be prior to that verdict. Temporal said people shouldn’t “pretend or assume” that lots of AI infringes on other people’s work because the law hasn’t caught up but the same argument applies equally to them (AI proponents) and they have already made up their mind, independent of any legal authority, that using unlicensed inputs is legal.

The difference in our opinions is that if I’m wrong, no harm done, if they’re wrong, lots of harm has already been done.

I’m trying to have a nuanced conversation but this has devolved into some pro/anti AI, all or nothing thing. If you still think I want to ban AI after this wall of text I don’t know what to tell you dude. If I’ve been unclear it’s not for lack of trying.

But this is hardly limited to AI.

Copyright is full of grey areas and disagreement over its rules happen all the time. AI is not particularly special in that regard, except perhaps in scale.

Generally the way stuff moves forward is somebody tries something, gets sued and either they win or lose and we move forward from that point.

Ultimately "harm" and "legality" are very different things. Something could be legal and harmful - many things are. In this debate i think different groups are harmed depending on which side that "wins".

If you want to have a nuanced debate, the relavent issue is not if the input works are licensed - they obviously are not, but on the following principles:

- de minimis - is the amount of each individual copyrighted work too small to matter.

- is the AI just extracting "factual" information from the works separate from their presentation. After all each individual work only adjusts the model by a couple bytes. Is it less like copying the work or more like writing a book about the artwork that someone could later use to make a similar work (which would not be copyright infringement if a human did it)

- fair use - complicated, but generally the more "transformative" a work is, the more fair use it would be, and AI is extremely transformative. On the other hand it potentially competes commercially with the original work, which usually means less likely to be fair use (and maybe you could have a mixed outcome here, where the AI generators are fine, but using them to sell competing artwork is not, but other uses are ok).

[Ianal]

It's unauthorized commercial use. Which part of that is confusing to you?
So is google books, and that got ruled as fair use. That it's being used commercially is not a slam dunk case against an argument for fair use.