Hacker News new | ask | show | jobs
by andybak 1258 days ago
Opt-in would hugely limit the amount of material - possibly to the point where AI research becomes unviable for many applications.

This would impact pure research and other use cases which are likely to be of benefit to society as a whole.

Copyright was created with clear legal limitations (albeit that those limitations are often being eroded by corporate interests).

The "natural" state of man is without copyright and it's imposition isn't a moral right - it's a legal trade-off that should carefully weigh up cost vs benefit.

1 comments

> Opt-in would hugely limit the amount of material - possibly to the point where AI research becomes unviable for many applications.

Not necessarily a bad thing. If your tools require widespread breaking of existing laws, your tools are broken.

> it's a legal trade-off that should carefully weigh up cost vs benefit.

Cost: Loss of millions of jobs which are suddenly invalidated by the loss of copyright. Less content overall being created.

Benefit: AI can freely consume what content is left.

I think the cost is far too high.

EDIT: I'd also point out that the "natural" state of man is no laws at all - ownership of goods is enforced with only your strength of arms. I really don't want to live in that kind of world.

> If your tools require widespread breaking of existing laws, your tools are broken.

What laws do you mean?

Copyright (see the copious evidence that training is not respecting copyrights or licenses across the latest commercial machine learning algorithms).

The same law you brought up? Laws which are written into national treaties?

Regardless of your beliefs about copyright and how it should be changed (or abolished), it is not just the law of the land, it is the law of the world.

> Copyright (see the copious evidence that training is not respecting copyrights or licenses across the latest commercial machine learning algorithms).

That's slightly different. Just because an AI model is capable of plagarism, that's a factor that emerges from usage. The act of training a model hasn't currently been judged to be illegal and neither as far as I know (at least in most jurisdictions) has the initial data gathering.

Creating output that infringes on someone elses copyright is obviously problemmatic and few would argue otherwise. But that isn't a problem specific to AI.

> Just because an AI model is capable of plagarism, that's a factor that emerges from usage.

I disagree, because it could be prevented by not training on copyrighted material in the first place. You can't reproduce a highly-unique Nat Geo cover photo if it hasn't been encoded into the model. You can't reproduce watermarks that aren't in the training set. Remember: it's an algorithm, not person. The outputs are directly related to what's been encoded into the model.

> The act of training a model hasn't currently been judged to be illegal

It's outside of the rights granted by the copyright holders in most cases (MIT and CC are probably the few licenses which allows this, and even CC/NC wouldn't allow it and MIT falls afoul of the attribution clause).

The legal system hasn't yet reviewed such widespread infringement yet to create precedent. But eventually it will, and I don't believe it will be kind.

> You can't reproduce watermarks that aren't in the training set.

You most certainly can reproduce watermarks that aren't in the training set. It can synthesize novel watermarks the same way it can synthesize novel images.

The vast majority of output is non-infringing.

> It's outside of the rights granted by the copyright holders in most cases (MIT and CC are probably the few licenses which allows this, and even CC/NC wouldn't allow it and MIT falls afoul of the attribution clause).

Except

a) This assumes that training isn't covered under fair use

b) attribution is only required for output that infringes - not the model itself.

Most of your arguments seem to apply to the output not to the model itself or the act of training it.

> I disagree, because it could be prevented by not training on copyrighted material in the first place.

This.

> The legal system hasn't yet reviewed such widespread infringement yet to create precedent. But eventually it will, and I don't believe it will be kind.

There you go, we'll see how this is tested against the AI bros once the legal system catches up.