Hacker News new | ask | show | jobs
by wyldfire 823 days ago
> Very little training data is properly licensed or compensated.

Could it ever be the case, I wonder, if we could trust/enforce/believe that a model had so abstracted what it learned from the training inputs such that the model was not a derived work from them?

I've seen the examples where the model is able to reproduce recognizable characters from popular media. Those look like they might be "just" overfitting? While I can see that as desirable from the point of view of being able to create a picture of "Robocop shopping for diapers". But maybe we could compromise and converge to a point where AI art isn't quite so demonized and instead is seen as a useful tool.

2 comments

I think it's obviously problematic that these companies are deriving value from millions of people without compensating them, while creating a product that competes with those masses.
You are describing the original meaning of "cultural appropriation", like when jazz and rock & roll were copied from Black American culture and sold.
I am describing "copyright infringement"
If you are selling something, and no one is buying it, the value you have generated is zero. If you put something online and you did not bother to understand this material can potentially be used by a third-party on account of its loose licensing, then who's to blame?
But the licensing isn't loose in many (most?) cases we're aware of. Merely making an image publicly available online doesn't give the viewers rights to do whatever they want with it under our copyright laws.
Well, I suppose the keyword here is "most?" because the burden of proof lies with the prosecution, the legal gymnastics of coming up a reasonable argument to this will be interesting.
They could at least make an effort to purchase licenses from all non-open content, comply with open licenses, and exclude content otherwise. They aren't making anything more than the most lame token effort because they don't care.
But that's just it - if we believe that what the model learns from the training material is abstract enough, they shouldn't license the content at all. Humans learn from and are inspired by art all the time. They create new works that are not considered derived works, despite there being obvious influence. Could we conceive of the same circumstance being possible with machine learning?
If we go down this road right now, we are allowing superintelligent AI powered corporations to front-run the entire human race and sell everything we think back to us.

It's not about theory of mind stuff. It's about just compensation of living human beings.

Well, with the status quo, there's no license required to train on the greatest works of art from centuries past.

I recognize some of the concerns about AI but I don't think pinning hopes on copyright law will deliver anything remotely resembling a remedy to the problems you bring up.

Are you talking about training a human or training an artist?

Downloading copyrighted data at huge scales to use in your commercial software product is pretty substantially different than an art student studying a reference.

> Are you talking about training a human or training an artist?

Neither: I am talking about training a machine learning model. Unless that's what you meant by "artist"?

> Downloading copyrighted data at huge scales to use in your commercial software product is pretty substantially different than an art student studying a reference.

You may have misunderstood my comment. My comment was stating that there's only a portion of human art - the most recent decades of works - which are protected by a copyright. Models like Stable Diffusion could be re-trained instead on centuries of artworks and not infringe at all. So the problem described as "AI powered corporations to front-run the entire human race and sell everything we think back to us" - this problem is here regardless of whether licenses were purchased.

1848 had a publication that might interest you.

I think the manifesto is missing some important aspects about game theory and human nature, and for some of that theory of mind is indeed very important, and that's why this particular political experiment didn't work out in the end despite the good intentions and that several aspects have become globally accepted.

I’ve read it but I think this case is much more unambiguous. Workers are paid; Marx would argue they are systematically underpaid and disempowered.

In this case the workers are not paid at all. Their work is not even acknowledged. It’s closer to cultural appropriation but quite a bit more unambiguous than that as well since this isn’t people learning from people. This is mass uncompensated value harvesting.

The number of hands benefiting here are incredibly tiny. In theory you could have one human owning the entire human mind and renting it back. This is the danger of present generation AI, not Skynet scenarios, and it anything the sci-fi stuff distracts us from this.

It’s like an information theory equivalent of today’s shoplifting epidemic except there are tiny gangs of only a few shoplifters able to run at Mach 10 and shoplift from every store in the country in days.